/ \ C | 
1, 7] oy 
i. Ye (3: lf. > . 4 ‘ ; 
| 0 | ¥ df ry he 
ee : ' . 5 ‘ . 7 7 v { Sn 7 


Revised Second Edition 


Kai Lai Chung 


A 
COURSE IN 
PROBABILITY 
THEORY 


THIRD EDITION 


A 
COURSE IN 
PROBABILITY 
THEORY 


THIRD EDITION 


Kai Lai Chung 
Stanford University 


ACADEMIC PRESS 


A Harcourt Science and Technology Company 


San Diego San Francisco New York 
Boston London Sydney Tokyo 


This book is printed on acid-free paper. € 


CopyYRIGHT © 2001, 1974, 1968 By ACADEMIC PRESS 

ALL RIGHTS RESERVED. 

NO PART OF THIS PUBLICATION MAY BE REPRODUCED OR TRANSMITTED IN ANY FORM OR BY ANY 
MEANS, ELECTRONIC OR MECHANICAL, INCLUDING PHOTOCOPY, RECORDING, OR ANY INFORMATION 
STORAGE AND RETRIEVAL SYSTEM, WITHOUT PERMISSION IN WRITING FROM THE PUBLISHER. 


Requests for permission to make copies of any part of the work should be mailed to the 
following address: Permissions Department, Harcourt, Inc., 6277 Sea Harbor Drive, Orlando, 
Florida 32887-6777. 


ACADEMIC PRESS 


A Harcourt Science and Technology Company 
525 B Street, Suite 1900, San Diego, CA 92101-4495, USA 
http://www.academicpress.com 


ACADEMIC PRESS 
Harcourt Place, 32 Jamestown Road, London, NW1 7BY, UK 
http://www.academicpress.com 


Library of Congress Cataloging in Publication Data: 00-106712 
International Standard Book Number: 0-12-174151-6 


PRINTED IN THE UNITED STATES OF AMERICA 
00 01 02 03 IP 987654321 


Contents 


Preface to the third edition ix 
Preface to the second edition Xi 
Preface to the first edition xiii 


1 | Distribution function 


1.1 Monotone functions 1 
1.2 Distribution functions 7 
1.3 Absolutely continuous and singular distributions 11 


2 | Measure theory 


2.1 Classes of sets 16 © 
2.2 Probability measures and their distribution 


functions 21 
3 | Random variable. Expectation. Independence 


3.1 General definitions 34 
3.2 Properties of mathematical expectation 41 
3.3 Independence 53 


4 | Convergence concepts 


4.1 Various modes of convergence 68 
4.2 Almost sure convergence; Borel—Cantelli lemma 75 


vi | CONTENTS 


4.3 Vague convergence 84 
4.4 Continuation 91] 
4.5 Uniform integrability; convergence of moments 99 


5 | Law of large numbers. Random series 


5.1 Simple limit theorems 106 

5.2 Weak law of large numbers 112 

5.3 Convergence of series 121 

5.4 Strong law of large numbers 129 

5.5 Applications 138 
Bibliographical Note 148 


6 | Characteristic function 


6.1 General properties; convolutions 150 

6.2 Uniqueness and inversion 160 

6.3 Convergence theorems 169 

6.4 Simple applications 175 

6.5 Representation theorems 187 

6.6 Multidimensional case; Laplace transforms 196 
Bibliographical Note 204 


7 | Central limit theorem and its ramifications 


7.1 Liapounov’s theorem 205 
7.2 Lindeberg—Feller theorem 214 
7.3 Ramifications of the central limit theorem 224 
7.4 Error estimation 235 
7.5 Law of the iterated logarithm 242 
7.6 Infinite divisibility 250 
Bibliographical Note 261 


8 | Random walk 


8.1 Zero-or-one laws 263 

8.2 Basic notions 210 

8.3 Recurrence 278 

8.4 Fine structure 288 

8.5 Continuation 298 
Bibliographical Note 308 


9 | Conditioning. Markov property. Martingale 


9.1 Basic properties of conditional expectation 310 
9.2 Conditional independence; Markov property 322 
9.3 Basic properties of smartingales 334 
9.4 Inequalities and convergence 346 
9.5 Applications 360 

Bibliographical Note 373 


Supplement: Measure and Integral 


1 Construction of measure 375 

2 Characterization of extensions 380 
3 Measures in R 387 

4 Integral 395 

5 Applications 407 


General Bibliography 413 


Index 415 


CONTENTS | vii 


Preface to the third edition 


In this new edition, I have added a Supplement on Measure and Integral. 
The subject matter is first treated in a general setting pertinent to an abstract 
measure space, and then specified in the classic Borel-Lebesgue case for the 
real line. The latter material, an essential part of real analysis, is presupposed 
in the original edition published in 1968 and revised in the second edition 
of 1974. When I taught the course under the title “Advanced Probability” 
at Stanford University beginning in 1962, students from the departments of 
Statistics, operations research (formerly industrial engineering), electrical engi- 
neering, etc. often had to take a prerequisite course given by other instructors 
before they enlisted in my course. In later years I prepared a set of notes, 
lithographed and distributed in the class, to meet the need. This forms the 
basis of the present Supplement. It is hoped that the result may as well serve 
in an introductory mode, perhaps also independently for a short course in the 
stated topics. 

The presentation is largely self-contained with only a few particular refer- 
ences to the main text. For instance, after (the old) §2.1 where the basic notions 
of set theory are explained, the reader can proceed to the first two sections of 
the Supplement for a full treatment of the construction and completion of a 
general measure; the next two sections contain a full treatment of the mathe- 
matical expectation as an integral, of which the properties are recapitulated in 
§3.2. In the final section, application of the new integral to the older Riemann 
integral in calculus is described and illustrated with some famous examples. 
Throughout the exposition, a few side remarks, pedagogic, historical, even 
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judgmental, of the kind I used to drop in the classroom, are approximately 
reproduced. 

In drafting the Supplement, I consulted Patrick Fitzsimmons on several 
occasions for support. Giorgio Letta and Bernard Bru gave me encouragement 
for the uncommon approach to Borel’s lemma in §3, for which the usual proof 
always left me disconsolate as being too devious for the novice’s appreciation. 

A small number of additional remarks and exercises have been added to 
the main text. 

Warm thanks are due: to Vanessa Gerhard of Academic Press who deci- 
phered my handwritten manuscript with great ease and care; to Isolde Field 
of the Mathematics Department for unfailing assistence; to Jim Luce for a 
mission accomplished. Last and evidently not least, my wife and my daughter 
Corinna performed numerous tasks indispensable to the undertaking of this 
publication. 


Preface to the second edition 


This edition contains a good number of additions scattered throughout the 
book as well as numerous voluntary and involuntary changes. The reader who 
is familiar with the first edition will have the joy (or chagrin) of spotting new 
entries. Several sections in Chapters 4 and 9 have been rewritten to make the 
material more adaptable to application in stochastic processes. Let me reiterate 
that this book was designed as a basic study course prior to various possible 
specializations. There is enough material in it to cover an academic year in 
class instruction, if the contents are taken seriously, including the exercises. 
On the other hand, the ordering of the topics may be varied considerably to 
suit individual tastes. For instance, Chapters 6 and 7 dealing with limiting 
distributions can be easily made to precede Chapter 5 which treats almost 
sure convergence. A specific recommendation is to take up Chapter 9, where 
conditioning makes a belated appearance, before much of Chapter 5 or even 
Chapter 4. This would be more in the modern spirit of an early weaning from 
the independence concept, and could be followed by an excursion into the 
Markovian territory. 

Thanks are due to many readers who have told me about errors, obscuri- 
ties, and inanities in the first edition. An incomplete record includes the names 
below (with apology for forgotten ones): Geoff Eagleson, Z. Govindarajulu, 
David Heath, Bruce Henry, Donald Iglehart, Anatole Joffe, Joseph Marker, 
P. Masani, Warwick Millar, Richard Olshen, S. M. Samuels, David Siegmund, 
T. Thedéen, A. Gonzdlez Villa lobos, Michel Weil, and Ward Whitt. The 
revised manuscript was checked in large measure by Ditlev Monrad. The 
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galley proofs were read by David Kreps and myself independently, and it was 
fun to compare scores and see who missed what. But since not all parts of 
the old text have undergone the same scrutiny, readers of the new edition 
are cordially invited to continue the fault-finding. Martha Kirtley and Joan 
Shepard typed portions of the new material. Gail Lemmond took charge of 
the final page-by-page revamping and it was through her loving care that the 
revision was completed on schedule. 

In the third printing a number of misprints and mistakes, mostly minor, are 
corrected. I am indebted to the following persons for some of these corrections: 
Roger Alexander, Steven Carchedi, Timothy Green, Joseph Horowitz, Edward 
Korn, Pierre van Moerbeke, David Siegmund. 

In the fourth printing, an oversight in the proof of Theorem 6.3.1 is 
corrected, a hint is added to Exercise 2 in Section 6.4, and a simplification 
made in (VID) of Section 9.5. A number of minor misprints are also corrected. I 
am indebted to several readers, including Asmussen, Robert, Schatte, Whitley 
and Yannaros, who wrote me about the text. 


Preface to the first edition 


A mathematics course is not a stockpile of raw material nor a random selection 
of vignettes. It should offer a sustained tour of the field being surveyed and 
a preferred approach to it. Such a course is bound to be somewhat subjective 
and tentative, neither stationary in time nor homogeneous in space. But it 
should represent a considered effort on the part of the author to combine his 
philosophy, conviction, and experience as to how the subject may be learned 
and taught. The field of probability is already so large and diversified that 
even at the level of this introductory book there can be many different views 
on orientation and development that affect the choice and arrangement of its 
content. The necessary decisions being hard and uncertain, one too often takes 
refuge by pleading a matter of “taste.” But there is good taste and bad taste 
in mathematics just as in music, literature, or cuisine, and one who dabbles in 
it must stand judged thereby. 

It might seem superfluous to emphasize the word “probability” in a book 
dealing with the subject. Yet on the one hand, one used to hear such specious 
utterance as “probability is just a chapter of measure theory”; on the other 
hand, many still use probability as a front for certain types of analysis such as 
combinatorial, Fourier, functional, and whatnot. Now a properly constructed 
course in probability should indeed make substantial use of these and other 
allied disciplines, and a strict line of demarcation need never be drawn. But 
PROBABILITY is still distinct from its tools and its applications not only in 
the final results achieved but also in the manner of proceeding. This is perhaps 
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best seen in the advanced study of stochastic processes, but will already be 
abundantly clear from the contents of a general introduction such as this book. 

Although many notions of probability theory arise from concrete models 
in applied sciences, recalling such familiar objects as coins and dice, genes 
and particles, a basic mathematical text (as this pretends to be) can no longer 
indulge in diverse applications, just as nowadays a course in real variables 
cannot delve into the vibrations of strings or the conduction of heat. Inciden- 
tally, merely borrowing the jargon from another branch of science without 
treating its genuine problems does not aid in the understanding of concepts or 
the mastery of techniques. 

A final disclaimer: this book is not the prelude to something else and does 
not lead down a strait and righteous path to any unique fundamental goal. 
Fortunately nothing in the theory deserves such single-minded devotion, as 
apparently happens in certain other fields of mathematics. Quite the contrary, 
a basic course in probability should offer a broad perspective of the open field 
and prepare the student for various further possibilities of study and research. 
To this aim he must acquire knowledge of ideas and practice in methods, and 
dwell with them long and deeply enough to reap the benefits. 

A brief description will now be given of the nine chapters, with some 
suggestions for reading and instruction. Chapters 1 and 2 are preparatory. A 
synopsis of the requisite “measure and integration” is given in Chapter 2, 
together with certain supplements essential to probability theory. Chapter 1 is 
really a review of elementary real variables; although it is somewhat expend- 
able, a reader with adequate background should be able to cover it swiftly and 
confidently — with something gained from the effort. For class instruction it 
may be advisable to begin the course with Chapter 2 and fill in from Chapter 1 
as the occasions arise. Chapter 3 is the true introduction to the language and 
framework of probability theory, but I have restricted its content to what is 
crucial and feasible at this stage, relegating certain important extensions, such 
as shifting and conditioning, to Chapters 8 and 9. This is done to avoid over- 
loading the chapter with definitions and generalities that would be meaningless 
without frequent application. Chapter 4 may be regarded as an assembly of 
notions and techniques of real function theory adapted to the usage of proba- 
bility. Thus, Chapter 5 is the first place where the reader encounters bona fide 
theorems in the field. The famous landmarks shown there serve also to intro- 
duce the ways and means peculiar to the subject. Chapter 6 develops some of 
the chief analytical weapons, namely Fourier and Laplace transforms, needed 
for challenges old and new. Quick testing grounds are provided, but for major 
battlefields one must await Chapters 7 and 8. Chapter 7 initiates what has been 
called the “central problem” of classical probability theory. Time has marched 
on and the center of the stage has shifted, but this topic remains without 
doubt a crowning achievement. In Chapters 8 and 9 two different aspects of 
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(discrete parameter) stochastic processes are presented in some depth. The 
random walks in Chapter 8 illustrate the way probability theory transforms 
other parts of mathematics. It does so by introducing the trajectories of a 
process, thereby turning what was static into a dynamic structure. The same 
revolution is now going on in potential theory by the injection of the theory of 
Markov processes. In Chapter 9 we return to fundamentals and strike out in 
major new directions. While Markov processes can be barely introduced in the 
limited space, martingales have become an indispensable tool for any serious 
study of contemporary work and are discussed here at length. The fact that 
these topics are placed at the end rather than the beginning of the book, where 
they might very well be, testifies to my belief that the student of mathematics 
is better advised to learn something old before plunging into the new. 

A short course may be built around Chapters 2, 3, 4, selections from 
Chapters 5, 6, and the first one or two sections of Chapter 9. For a richer fare, 
substantial portions of the last three chapters should be given without skipping 
any one of them. In a class with solid background, Chapters 1, 2, and 4 need 
not be covered in detail. At the opposite end, Chapter 2 may be filled in with 
proofs that are readily available in standard texts. It is my hope that this book 
may also be useful to mature mathematicians as a gentle but not so meager 
introduction to genuine probability theory. (Often they stop just before things 
become interesting!) Such a reader may begin with Chapter 3, go at once to 
Chapter 5 with a few glances at Chapter 4, skim through Chapter 6, and take 
up the remaining chapters seriously to get a real feeling for the subject. 

Several cases of exclusion and inclusion merit special comment. I chose 
to construct only a sequence of independent random variables (in Section 3.3), 
rather than a more general one, in the belief that the latter is better absorbed in a 
course on stochastic processes. I chose to postpone a discussion of conditioning 
until quite late, in order to follow it up at once with varied and worthwhile 
applications. With a little reshuffling Section 9.1 may be placed right after 
Chapter 3 if so desired. I chose not to include a fuller treatment of infinitely 
divisible laws, for two reasons: the material is well covered in two or three 
treatises, and the best way to develop it would be in the context of the under- 
lying additive process, as originally conceived by its creator Paul Lévy. I 
took pains to spell out a peripheral discussion of the logarithm of charac- 
teristic function to combat the errors committed on this score by numerous 
existing books. Finally, and this is mentioned here only in response to a query 
by Doob, I chose to present the brutal Theorem 5.3.2 in the original form 
given by Kolmogorov because I want to expose the student to hardships in 
mathematics. 

There are perhaps some new things in this book, but in general I have 
not striven to appear original or merely different, having at heart the interests 
of the novice rather than the connoisseur. In the same vein, I favor as a 
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rule of writing (euphemistically called “style”) clarity over elegance. In my 
opinion the slightly decadent fashion of conciseness has been overwrought, 
particularly in the writing of textbooks. The only valid argument I have heard 
for an excessively terse style is that it may encourage the reader to think for 
himself. Such an effect can be achieved equally well, for anyone who wishes 
it, by simply omitting every other sentence in the unabridged version. 

This book contains about 500 exercises consisting mostly of special cases 
and examples, second thoughts and alternative arguments, natural extensions, 
and some novel departures. With a few obvious exceptions they are neither 
profound nor trivial, and hints and comments are appended to many of them. 
If they tend to be somewhat inbred, at least they are relevant to the text and 
should help in its digestion. As a bold venture I have marked a few of them 
with * to indicate a “must,” although no rigid standard of selection has been 
used. Some of these are needed in the book, but in any case the reader’s study 
of the text will be more complete after he has tried at least those problems. 

Over a span of nearly twenty years I have taught a course at approx- 
imately the level of this book a number of times. The penultimate draft of 
the manuscript was tried out in a class given in 1966 at Stanford University. 
Because of an anachronism that allowed only two quarters to the course (as 
if probability could also blossom faster in the California climate!), I had to 
omit the second halves of Chapters 8 and 9 but otherwise kept fairly closely 
to the text as presented here. (The second half of Chapter 9 was covered in a 
subsequent course called “stochastic processes.) A good fraction of the exer- 
cises were assigned as homework, and in addition a great majority of them 
were worked out by volunteers. Among those in the class who cooperated 
in this manner and who corrected mistakes and suggested improvements are: 
Jack E. Clark, B. Curtis Eaves, Susan D. Horn, Alan T. Huckleberry, Thomas 
M. Liggett, and Roy E. Welsch, to whom IJ owe sincere thanks. The manuscript 
was also read by J. L. Doob and Benton Jamison, both of whom contributed 
a great deal to the final revision. They have also used part of the manuscript 
in their classes. Aside from these personal acknowledgments, the book owes 
of course to a large number of authors of original papers, treatises, and text- 
books. I have restricted bibliographical references to the major sources while 
adding many more names among the exercises. Some oversight is perhaps 
inevitable; however, inconsequential or irrelevant “name-dropping” is delib- 
erately avoided, with two or three exceptions which should prove the rule. 

It is a pleasure to thank Rosemarie Stampfel and Gail Lemmond for their 
superb job in typing the manuscript. 


1 Distribution function 


1.1 Monotone functions 


We begin with a discussion of distribution functions as a traditional way 
of introducing probability measures. It serves as a convenient bridge from 
elementary analysis to probability theory, upon which the beginner may pause 
to review his mathematical background and test his mental agility. Some of 
the methods as well as results in this chapter are also useful in the theory of 
stochastic processes. 

In this book we shall follow the fashionable usage of the words “posi- 
tive”, “negative”, “increasing”, “decreasing” in their loose interpretation. 
For example, “x is positive” means “x > 0”; the qualifier “strictly” will be 
added when “x > 0” is meant. By a “function” we mean in this chapter a real 
finite-valued one unless otherwise specified. 

Let then f be an increasing function defined on the real line (—00, +00). 
Thus for any two real numbers x; and x», 


(1) Xp <x. => f(x) < fx). 


We begin by reviewing some properties of such a function. The notation 
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“t * x” means “t <x, t—> x”; “t | x” means “t > x, t— x”. 
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(i) For each x, both unilateral limits 
(2) i f(t) = f(x—) and te ft) = ft) 
exist and are finite. Furthermore the limits at infinity 
lim f(t) = f(—oo) and lim f(t) = f(+o00) 
t)—oo tt+00 


exist; the former may be —ow, the latter may be +00. 
This follows from monotonicity; indeed 


fG-)=_ sup £0. fot)= inf fO, 


-—OO<1<x 


(ii) For each x, f is continuous at x if and only if 
f(x-) = f@) = fat). 


To see this, observe that the continuity of a monotone function f at x is 
equivalent to the assertion that 


ie fMO=fQ)= lim f(t). 
By (i), the limits above exist as f(x—) and f(x+) and 
(3) f(x) < f@) < f(t), 
from which (ii) follows. 


In general, we say that the function f has a jump at x iff the two limits 
in (2) both exist but are unequal. The value of f at x itself, viz. f(x), may be 
arbitrary, but for an increasing f the relation (3) must hold. As a consequence 
of (i) and (ii), we have the next result. 


(iii) The only possible kind of discontinuity of an increasing function is a 
jump. [The reader should ask himself what other kinds of discontinuity there 
are for a function in general.] 

If there is a jump at x, we call x a point of jump of f and the number 
f (x+) — f(x—) the size of the jump or simply “the jump” at x. 

It is worthwhile to observe that points of jump may have a finite point 
of accumulation and that such a point of accumulation need not be a point of 
jump itself. Thus, the set of points of jump is not necessarily a closed set. 
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Example 1. Let xo be an arbitrary real number, and define a function f as follows: 


f=0 for x <x) — 1; 
1 1 1 
=1l—-— forx -—- <x <x — —,n=1,2,...; 
n n n+l 
= 1 for x > Xo. 


The point xo is a point of accumulation of the points of jump {x9 — 1/n,n > J}, but 
f is continuous at Xp. 


Before we discuss the next example, let us introduce a notation that will 
be used throughout the book. For any real number f¢, we set 


0 for x < tf, 
(4) oi) { 1 for x >t. 


We shall call the function 6, the point mass at t. 


Example 2. Let {a,,n > 1} be any given enumeration of the set of all rational 
numbers, and let {b,, 1 > 1} be a set of positive (>0) numbers such that aa b, < OO. 
For instance, we may take b, = 2~". Consider now 


(5) FOS > dg. 


n=! 


Since 0 < dy, (x) < 1 for every n and x, the series in (5) is absolutely and uniformly 
convergent. Since each 6,, is increasing, it follows that if x) < x2, 


f 2) — FG) = Y> bul8a, (x2) — 50, @1)] = 0. 


n=l 


Hence f is increasing. Thanks to the uniform convergence (why?) we may deduce 
that for each x, 


(6) fODA=FO2)= > dilby@F) 03,6) 


n=] 


But for each n, the number in the square brackets above is 0 or 1 according as x 4 a, 
or x =a,. Hence if x is different from all the a,,’s, each term on the right side of (6) 
vanishes; on the other hand if x = a,, say, then exactly one term, that corresponding 
to n =k, does not vanish and yields the value b, for the whole series. This proves 
that the function f has jumps at all the rational points and nowhere else. 

This example shows that the set of points of jump of an increasing function 
may be everywhere dense; in fact the set of rational numbers in the example may 
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be replaced by an arbitrary countable set without any change of the argument. We 
now show that the condition of countability is indispensable. By “countable” we mean 
always “finite (possibly empty) or countably infinite”. 


(iv) The set of discontinuities of f is countable. 


We shall prove this by a topological argument of some general applica- 
bility. In Exercise 3 after this section another proof based on an equally useful 
counting argument will be indicated. For each point of jump x consider the 
open interval J, = (f(x—), f(x+)). If x’ is another point of jump and x < x’, 
say, then there is a point x such that x < x < x’. Hence by monotonicity we 
have 


f@+) = f&) < fQ@’-). 


It follows that the two intervals J, and J, are disjoint, though they may 
abut on each other if f(x+) = f(x’—). Thus we may associate with the set of 
points of jump in the domain of f a certain collection of pairwise disjoint open 
intervals in the range of f. Now any such collection is necessarily a countable 
one, since each interval contains a rational number, so that the collection of 
intervals is in one-to-one correspondence with a certain subset of the rational 
numbers and the latter is countable. Therefore the set of discontinuities is also 
countable, since it is in one-to-one correspondence with the set of intervals 
associated with it. 


(v) Let f; and f be two increasing functions and D a set that is (every- 
where) dense in (~—0oo, +00). Suppose that 


Wx € D: f(x) = fr). 


Then f; and f2 have the same points of jump of the same size, and they 
coincide except possibly at some of these points of jump. 

To see this, let x be an arbitrary point and let 4, € D, t, € D, ty t x, 
t, | x. Such sequences exist since D is dense. It follows from (i) that 


Fie) = lim f (tn) = lim f2ltn) = fre), 
ve filet) = lim fi(¢,) = lim f2(7,) = fo0r+). 
In particular 

Va: filet) — file—) = fale+)— foe). 


The first assertion in (v) follows from this equation and (ii). Furthermore if 
Jf 1S continuous at x, then so is f2 by what has just been proved, and we 
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have 


fi) = fi@—) = folx—) = fr), 


proving the second assertion. 

How can f; and f2 differ at all? This can happen only when f;(x) 
and f(x) assume different values in the interval (f\(x—), fi(x+)) = 
(f2%—), fa(x+)). It will turn out in Chapter 2 (see in particular Exercise 21 
of Sec. 2.2) that the precise value of f at a point of jump is quite unessential 
for our purposes and may be modified, subject to (3), to suit our convenience. 
More precisely, given the function f, we can define a new function f in 
several different ways, such as 


f@—) + f+) 
meee Gees 


and use one of these instead of the original one. The third modification is 
found to be convenient in Fourier analysis, but either one of the first two is 
more suitable for probability theory. We have a free choice between them and 
we shall choose the second, namely, right continuity. 


f@) = f@-), f@ = ft), f@) = 


(vi) If we put - 
Wx: f(x) = f(x+), 


then f is increasing and right continuous everywhere. 


Let us recall that an arbitrary function g is said to be right continuous at 
x iff lim,), g(t) exists and the limit, to be denoted by g(*+), is oven to g(x). 
To prove the assertion (vi) we must show that 


Va:lim f (t+) = f(xt). 


This is indeed true for any f such that f(t+) exists for every f. For then: 
given any € > 0, there exists 5(€) > 0 such that 


Ws € (x,x +8): |f(s) — f@t+)| S. 
Let t € (x,x +4) and let s | ¢ in the above, then we obtain 
[FG fat = «¢ 


which proves that f is right continuous. It is easy to see that it is increasing 
if f is so. 


Let D be dense in (—oo, +00), and suppose that f is a function with the 
domain D. We may speak of the monotonicity, continuity, uniform continuity, 
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and so on of f on its domain of definition if in the usual definitions we restrict 
ourselves to the points of D. Even if f is defined in a larger domain, we may 
still speak of these properties “on D” by considering the “restriction of f 
to D”. 


(vii) Let f be increasing on D, and define f on (—oo, +00) as follows: 


vx: f(x) = inf f(t). 


x<teD 


Then f is increasing and right continuous everywhere. 

This is a generalization of (vi). f is clearly increasing. To prove right 
continuity let an arbitrary x9 and € > 0 be given. There exists fp € D, to > Xo, 
such that 


f (to) —€ < fo) S f (to). 
Hence if t € D, xp < t < to, we have 
0 < f(t) — fa) < fo) — fo) <. 
This implies by the definition of f that for xp <x < to we have 
Caf Olaf Gass 


Since € is arbitrary, it follows that f is right continuous at xo, as was to be 
shown. 


EXERCISES 


1. Prove that for the f in Example 2 we have 


f(-oo) =0, f(+oo) = se 


n=1 


2. Construct an increasing function on (—oo, +00) with a jump of size 
one at, each integer, and constant between jumps. Such a function cannot be 
represented as Dae 1 ond, (x) with b, = 1 for each n, but a slight modification 
will do. Spell this out. 


**3, Suppose that f is increasing and that there exist real numbers A and 
B such that Vx:A < f(x) < B. Show that for each € > 0, the number of jumps 
of size exceeding € is at most (B — A)/e. Hence prove (iv), first for bounded 
f and then in general. 


** indicates specially selected exercises (as mentioned in the Preface). 
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b; the size at jump at a,, then 
F(aj)— F(aj—) =}; 
since F(a;+) = F(a;). Consider the function 


Fax) = Y > dja, (x) 
J 


which represents the sum of all the jumps of F in the half-line (—oo, x]. It is. 
clearly increasing, right continuous, with 


(3) Fy(-00) = 0, Fa(+oo)= 5°; <1. 
j 


Hence F is a bounded increasing function. It should constitute the “jumping 
part” of F, and if it is subtracted out from F, the remainder should be positive, 
contain no more jumps, and so be continuous. These plausible statements will 
now be proved—they are easy enough but not really trivial. 


Theorem 1.2.1. Let 
F(x) = FQ) — Fa); 


then F,. is positive, increasing, and continuous. 


PRooF. Let x < x’, then we have 


(4) Fa')-Faa)= S> bj) = S° [F(@j)- F@-)l 


xX<ajsx' X<ajSx’ 
< F(x’) — F(a). 
It follows that both Fg and F, are increasing, and if we put x = —oo in the 


above, we see that Fz < F and so F, is indeed positive. Next, Fy is night 
continuous since each 6,, is and the series defining Fg converges uniformly 
in x; the same argument yields (cf. Example 2 of Sec. 1.1) 

b; ifx =a; 

Fa(a) — Fa(x—) = { ee 

0 otherwise. 
Now this evaluation holds also if Fg is replaced by F according to the defi- 
nition of a; and b;, hence we obtain for each x: 


F(x) — F-@—) = F@) — FQx—) — [Fa@) — FaGe—)] = 0. 


This shows that F, is left continuous; since it is also right continuous, being 
the difference of two such functions, it is continuous. 
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Theorem 1.2.2. Let F be a d.f. Suppose that there exist a continuous func- 
tion G, and a function G, of the form 


Ga(x) = 50d) 5a (x) 
J 


[where {a’} is a countable set of real numbers and >> j [B% | < oo], such that 
PF = G, + Ga, 


then 
G. = Fe, Gy = Fa, 


where F. and Fg are defined as before. 


PROOF. If Fy A Gg, then either the sets {a;} and {a’;} are not identical, 
or we may relabel the a’, so that a, = a; for all j but bi, # b; for some j. In 
either case we have for at least one j, and a = a; or aj: 


[Fq(a) — Fa(a—)] — [Ga(a) — Ga(a—)] #0. 
Since F, — G, = Gq — Fg, this implies that 
F(a) — G,(a) — [F-(a—) — G-(a—)] £0, 


contradicting the fact that F, — G, is a continuous function. Hence Fg = Gy 
and consequently F,. = Ge. 


DEFINITION. A d.f. F that can be represented in the form 
P= > b ja, 
j 


where {a;} is a countable set of real numbers, b; > 0 for every j and }),b;=1, 
is called a discrete d.f. A d.f. that is continuous everywhere is called a contin- 
uous df. 


Suppose F,. € 0, Fy #0 in Theorem 1.2.1, then we may set a = Fy(oo) 
so thatO <a < I, 
1 
Fi, =-—Fyg, F,=——F,, 
a l-a 
and write 
(5) Fa=aF,+(1—a@)F2. 


Now F; is a discrete d.f., F> is a continuous d.f., and F is exhibited as a 
convex combination of them. If F, = 0, then F is discrete and we set a = |, 
F, =F, F, =0; if Fy = 0, then F is continuous and we set a = 0, F; = 0, 
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F, = F; in either extreme case (5) remains valid. We may now summarize 
the two theorems above as follows. 


Theorem 1.2.3. Every d.f. can be written as the convex combination of a 
discrete and a continuous one. Such a decomposition is unique. 


EXERCISES 


1. Let F be a d.f. Then for each x, 
lim[F(x + €) - F(x —6)] =0 
e{0 


unless x is a point of jump of F, in which case the limit is equal to the size 
of the jump. 


*2. Let F be a d.f. with points of jump {a j}. Prove that the sum 


S> LF@;) - F@j-) 


X-E<Gj<X 


converges to zero as € | 0, for every x. What if the summation above is 
extended to x — € < a; < x instead? Give another proof of the continuity of 
F,, in Theorem 1.2.1 by using this problem. 

3. A plausible verbal definition of a discrete d.f. may be given thus: “It 
is a d.f. that has jumps and is constant between jumps.” [Such a function is 
sometimes called a “step function’, though the meaning of this term does not 
seem to be well established.] What is wrong with this? But suppose that the set 
of points of jump is “discrete” in the Euclidean topology, then the definition 
is valid (apart from our convention of right continuity). 

4. For a general increasing function F there is a similar decomposition 
F = F,+ Fg, where both F, and Fg are increasing, F, is continuous, and 
Fq is “purely jumping”. [HinT: Let a be a point of continuity, put F(a) = 
F(a), add jumps in (a, oo) and subtract jumps in (—oo, a) to define Fy. Cf. 
Exercise 2 in Sec. 1.1.] 

5. Theorem 1.2.2 can be generalized to any bounded increasing function. 
More generally, let f be the difference of two bounded increasing functions on 
(—oo, +00); such a function is said to be of bounded variation there. Define 
its purely discontinuous and continuous parts and prove the corresponding 
decomposition theorem. 

*6. A point x is said to belong to the support of the d.f. F iff for every 
€ > 0 we have F(x + €) — F(x —€) > 0. The set of all such x is called the 
support of F. Show that each point of jump belongs to the support, and that 
each isolated point of the support is a point of jump. Give an example of a 
discrete d.f. whose support is the whole line. 
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7. Prove that the support of any df. is a closed set, and the support of 
any continuous d.f. is a perfect set. 


1.3 Absolutely continuous and singular distributions 


Further analysis of d.f.’s requires the theory of Lebesgue measure. Throughout 
the book this measure will be denoted by m; ‘‘almost everywhere” on the real 
line without qualification will refer to it and be abbreviated to “a.e.”; an 
integral written in the form [...dt is a Lebesgue integral; a function f is 
said to be “integrable” in (a, b) iff 


b 
7 Fiat 


is defined and finite [this entails, of course, that f be Lebesgue measurable]. 
The class of such functions will be denoted by L!(a, b), and L'(—oo, ov) is 
abbreviated to L!. The complement of a subset S of an understood “space” 
such as (—0o, +00) will be denoted by S°. 


DEFINITION. A function F is called absolutely continuous [in (—0o0o, ©) 


and with respect to the Lebesgue measure] iff there exists a function f in L} 
such that we have for every x < x’: 


(1) Fo’)~ Fay = [ f(t)dt. 


It follows from a well-known proposition (see, e.g., Natanson [3]*) that such 
a function F has a derivative equal to f a.e. In particular, if F is a df., then 


(2) f >Oae. and [ foaen. 


Conversely, given any f in L! satisfying the conditions in (2), the function F 
defined by 


(3) Vx: F(x) = / f(tjdt 
00 
is easily seen to be a d.f. that is absolutely continuous. 


DEFINITION. A function F is called singular iff it is not identically zero 
and F’ (exists and) equals zero a.e. 


* Numbers in brackets refer to the General Bibliography. 
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The next theorem summarizes some basic facts of real function theory; 
see, e.g., Natanson [3]. 


Theorem 1.3.1. Let F be bounded increasing with F(—oo) = 0, and let F’ 
denote its derivative wherever existing. Then the following assertions are true. 


(a) If S denotes the set of all x for which F’(x) exists with 0 < F’(x) < 
oo, then m(S°) = 0. 
(b) This F’ belongs to L!, and we have for every x < x’: 


, 


(4) ‘a F'(t)dt < F(’) — F(x). 
(c) If we put 
(5) Wx: Fac(x) = a Fi(t)dt, F(x) = FQ) — Faclx), 


then Fi, = F’ ae. so that Fi = F’— F'.=0 ae. and consequently F, is 
singular if it is not identically zero. 


DEFINITION. Any positive function f that is equal to F’ a.e. is called a 
density of F. Fac is called the absolutely continuous part, F, the singular part 
of F. Note that the previous Fg is part of F, as defined here. 


It is clear that F,, is increasing and F,, < F. From (4) it follows that if 
x <x’ 


Fy(e!) ~ Fels) = FO!) ~ Fa) — [ f(t)dt > 0. 


Hence F’, is also increasing and F, < F. We are now in a position to announce 
the following result, which is a refinement of Theorem 1.2.3. 


Theorem 1.3.2. Every d.f. F can be written as the convex combination of 
a discrete, a singular continuous, and an absolutely continuous d.f. Such a 
decomposition is unique. 


EXERCISES 


1. A df. F is singular if and only if F = F,; it is absolutely continuous 
if and only if F = Fy. 
2. Prove Theorem 1.3.2. 


*3. If the support of a d.f. (see Exercise 6 of Sec. 1.2) is of measure zero, 
then F is singular. The converse is false. 
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*4. Suppose that F is a d.f. and (3) holds with a continuous f. Then 
F’ = f > 0 everywhere. 

5. Under the conditions in the preceding exercise, the support of F is the 
closure of the set {t | f(t) > 0}; the complement of the support is the interior 
of the set {t | f(t) = O}. 

6. Prove that a discrete distribution is singular. [Cf. Exercise 13 of 
Sec. 2.2.] 

7. Prove that a singular function as defined here is (Lebesgue) measurable 
but need not be of bounded variation even locally. [HinT: Such a function is 
continuous except on a set of Lebesgue measure zero; use the completeness 
of the Lebesgue measure. ] 


The remainder of this section is devoted to the construction of a singular 
continuous distribution. For this purpose let us recall the construction of the 
Cantor (ternary) set (see, e.g., Natanson [3]). From the closed interval [0,1], 
the “middle third” open interval G 2) is removed; from each of the two 
remaining disjoint closed intervals the middle third, ( - z) and ( je 8), respec- 
tively, are removed and so on. After n steps, we have removed 


142+---+271=2"-1 


disjoint open intervals and are left with 2” disjoint closed intervals each of 
length 1/3”. Let these removed ones, in order of position from left to right, 
be denoted by J,.,, 1 < k < 2” — 1, and their union by U,,. We have 


1 2 4 Qn-! 2\" 


As n t oo, U, increases to an open set U; the complement C of U with 
respect to [0,1] is a perfect set, called the Cantor set. It is of measure zero 


since 
mC)=1—m(U)=1-—-1=0. 


Now for each n andk,n > 1,1 <k <2" —1, we put 


Chik = n° 


and define a function F on U as follows: 
(7) F(x)=cn, forx €Jnx. 


This definition is consistent since two intervals, J, ; and Jy, are either 
disjoint or identical, and in the latter case so are Cy,4 = Cn’x’. The last assertion 
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becomes obvious if we proceed step by step and observe that 
Jntt2k = Inks Cnti2k = Cn forl<k<2"—-1, 


The value of F is constant on each J,,, and is strictly greater on any other 
Jn/,x° Situated to the right of J, ,. Thus F is increasing and clearly we have 


eee ==), MEE) — a 


Let us complete the definition of F by setting 
F(x)=0 forx<0, F(ix)=1 forx>1. 


F is now defined on the domain D = (—o0, 0) UU U (1, 00) and increasing 
there. Since each J, , is at a distance > 1/3” from any distinct J nk and the 
total variation of F over each of the 2” disjoint intervals that remain after 
removing J, 4, 1 <k <2" —1, is 1/2”, it follows that 


O<x/-x< o => 0 < F(x’)— F(x) < = 
Hence F is uniformly continuous on D. By Exercise 5 of Sec. 1.1, there exists 
a continuous increasing F on (—oo, +00) that coincides with F on D. This 
F is a continuous d-f. that is constant on each Jn. It follows that F'=0 
on U and so also on (—oo, +00) — C. Thus F is singular. Alternatively, it 
is clear that none of the points in D is in the support of F, hence the latter 
is contained in C and of measure 0, so that F is singular by Exercise 3 
above. [In Exercise 13 of Sec. 2.2, it will become obvious that the measure 
corresponding to F is singular because there is no mass in U .] 


EXERCISES 


The F in these exercises is the F defined above. 
8. Prove that the support of F is exactly C. 


*9. It is well known that any point x in C has a ternary expansion without 
the digit 1: 
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10. For each x € [0, 1], we have 


x 2 x 
OF (5) = F(x), 2F € 4: ) he iy. 


3 


11. Calculate 


1 1 ] 
[ x dF (x), | x’ dF (x), i e'* dF (x). 
0 0 0 


[HINT: This can be done directly or by using Exercise 10; for a third method 
see Exercise 9 of Sec. 5.3.] 

12. Extend the function F on [0,1] trivially to (—oo, 00). Let {r,,} be an 
enumeration of the rationals and 


[o6) 
1 
G(x) = —F(r, +x). 
) > ra 
Show that G is a d.f. that is strictly increasing for all x and singular. Thus we 
have a singular d.f. with support (—oo, 00). 

*13. Consider F on [0,1]. Modify its inverse F~! suitably to make it 
single-valued in [0,1]. Show that F~! so modified is a discrete d.f. and find 
its points of jump and their sizes. 

14. Given any closed set C in (—oo, +00), there exists a d.f. whose 
support is exactly C. [HinT: Such a problem becomes easier when the corre- 
sponding measure is considered; see Sec. 2.2 below.] 

*15. The Cantor df. F is a good building block of “pathological” 
examples. For example, let H be the inverse of the homeomorphic map of [0,1] 
onto itself: x > SIF (x) +x]; and E a subset of [0,1] which is not Lebesgue 
measurable. Show that 

ly(eyH = lg 


where H(£) is the image of E, 1g is the indicator function of B, and - denotes 
the composition of functions. Hence deduce: (1) a Lebesgue measurable func- 
tion of a strictly increasing and continuous function need not be Lebesgue 
measurable; (2) there exists a Lebesgue measurable function that is not Borel 
measurable. 


2? Measure theory 


2.1 Classes of sets 


Let 2 be an “abstract space”, namely a nonempty set of elements to be 
called “points” and denoted generically by w. Some of the usual opera- 
tions and relations between sets, together with the usual notation, are given 
below. 


Union : EUF, UE, 
Intersection : ENF, (\En 
Complement : Ef = Q\E 

Difference : E\F =EN F* 
Symmetric difference : EaF=(E\F)U(F\E) 
Singleton : {w} 


Containing (for subsets of (2 as well as for collections thereof): 
ECF, FOE (not excluding EF = F) 
A CA, BD.S (not excluding «/ = 4) 
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Belonging (for elements as well as for sets): 
WEE, Fes 
Empty set: @ 
The reader is supposed to be familiar with the elementary properties of these 
operations. 

A nonempty collection ./ of subsets of Q may have certain “closure 
properties”. Let us list some of those used below; note that j is always an 
index for a countable set and that commas as well as semicolons are used to 
denote “conjunctions” of premises. 

(i) FEY SEE, 

Gi) FE, E€ YM E,.E MDE VELE S. 

Gii) £, € PM, F,.EAMDBE, NEVE. 

(iv) Vn >2:E;€ GM 1<jsn>UjjFj em. 

(v) Wn >2:E;€D,1 <jsna(VjurFj em”. 
(vi) £7 € W;E; CEjy,1<j<0osU2 Fe. 
(vil) EF; E€ M,E; DEjy,1<j<0~o> Nyon Ej EDA, 
(viii) Ej €.7,1< j<co SUR Ee. 

(ix) E;E AVS ji <w SVE; Ee. 

(x) £, €.9%,F.€A%,E; CE.> E7\E, Ee, 

It follows from simple set algebra that under (i): (ai) and (ili) are equiv- 
alent; (vi) and (vii) are equivalent; (viii) and (ix) are equivalent. Also, (ii) 


implies (iv) and (iii) implies (v) by induction. It is trivial that (viii) implies 
(ii) and (vi); (ix) implies (iii) and (vii). 


DEFINITION. A nonempty collection 7of subsets of {2 is called a field iff 
(i) and (ii) hold. It is called a monotone class (M.C.) iff (vi) and (vii) hold. It 
is called a Borel field (B.F.) iff (i) and (viii) hold. 


Theorem 2.1.1. A field is a B.F. if and only if it is also an M.C. 


PROOF. The “only if” part is trivial; to prove the “if” part we show that 
(iv) and (vi) imply (viii). Let E; € .o/ for 1 < j < oo, then 
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hence U5", Ej € «7 by (vi). 


The collection “ of all subsets of Q is a B.F. called the total B.F.; the 
collection of the two sets {@, Q2} is a B.F. called the trivial B.F. If A is any 
index set and if for every a € A, % is a B.F. (or M.C.) then the intersection 
ees Za Of all these B.F.’s (or M.C.’s), namely the collection of sets each 
of which belongs to all A, is also a B.F. (or M.C.). Given any nonempty 
collection € of sets, there is a minimal B.F. (or field, or M.C.) containing it; 
this is just the intersection of all B.F.’s (or fields, or M.C.’s) containing €, 
of which there is at least one, namely the “ mentioned above. This minimal 
B.F. (or field, or M.C.) is also said to be generated by ©. In particular if % 


G 


is a field there is a minimal B.F. (or M.C.) containing “4 
Theorem 2.1.2. Let & be a field, & the minimal M.C. containing 4, ¥ the 
minimal B.F. containing #, then ¥ = &. 


PROOF. Since a B.F. is an M.C., we have ¥ D &. To prove ¥ C F it is 
sufficient to show that ¥ is a B.F. Hence by Theorem 2.1.1 it is sufficient to 
show that & is a field. We shall show that it is closed under intersection and 
complementation. Define two classes of subsets of & as follows: 


( ={Eeg: ENF €@ forall F € HA}, 
M={Eeg: ENF €& forall Fe G}. 
The identities 


oO 
E;| =(J@E;) 
j=! j=l . 


ms 
5 
C8 


FO(()8;] =(\@&;) 


j=! j=l 


show that both ¢, and are M.C.’s. Since & is closed under intersection and 
contained in 4, it is clear that % C ¢;. Hence 4 C by the minimality of 7 
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and so & = «. This means for any F € # and E€ Y we have FNE € &, 
which in turn means & C ). Hence & = and this means & is closed under 
intersection. 
Next, define another class of subsets of & as follows: 
6 ={EEG:E° eg} 


The (DeMorgan) identities 


j=l j=l 
c 
fo.0) oo 
_ c 
(12) =UE; 


show that @ is a M.C. Since A C &%, it follows as before that ¢ = &, which 
means & is closed under complementation. The proof is complete. 


Corollary. Let “ be a field, Y the minimal B.F. containing &; ¢ a class 
of sets containing “ and having the closure properties (vi) and (vii), then @ 
contains 7. 


The theorem above is one of a type called monotone class theorems. They 
are among the most useful tools of measure theory, and serve to extend certain 
relations which are easily verified for a special class of sets or functions to a 
larger class. Many versions of such theorems are known; see Exercise 10, 11, 
and 12 below. 


EXERCISES 


"1. (Uj Ap) \CU; Bi) CU AA\B/) AN AD\O; Bi) C Uj;Aj\B;). When 
is there equality? 
*2. The best way to define the symmetric difference is through indicators 
of sets as follows: 
l~ap=latl, (mod 2) 


where we have arithmetical addition modulo 2 on the right side. All properties 
of A follow easily from this definition, some of which are rather tedious to 
verify otherwise. As examples: 


(AAB)AC=AA(BAC), 
(AAB)A(BaC)=AAC, 
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(AaB)a(CaD) = (AaC)a(BaD), 
AAB=CSA=BaC, 
AAB=CaDSAaC=BabD. 


3. If Q has exactly n points, then / has 2” members. The B.F. generated 
by n given sets “without relations among them” has 22” members. 

4. If 2 is countable, then ./ is generated by the singletons, and 
conversely. [HINT: All countable subsets of Q and their complements form 
a B.F.] 

5. The intersection of any collection of B.F.’s {%, a € A} is the maximal 
B.F. contained in all of them; it is indifferently denoted by (),<4 % or AveaA- 


*6. The union of a countable collection of B.F.’s {F;} such that ¥; C F441 
need not be a B.F., but there is a minimal B.F. containing all of them, denoted 
by Vv; ¥. In general V,<4% denotes the minimal B-F. containing allA,a eA. 
[HINT: {2 = the set of positive integers; 7 = the B.F. generated by those up 
to j.] 

7. A B.F. is said to be countably generated iff it is generated by a count- 
able collection of sets. Prove that if each Y is countably generated, then so 
is Vi) FH. 

*8. Let F be a BF. generated by an arbitrary collection of sets {Ey,a@ € 
A}. Prove that for each E € #, there exists a countable subcollection {Ea,,j = 
1} (depending on £) such that E belongs already to the B.F. generated by this 
subcollection. [HinT: Consider the class of all sets with the asserted property 
and show that it is a B.F. containing each E,.] 

9. If F is a B.F. generated by a countable collection of disjoint sets 
{A,}, such that U,, A, = &, then each member of is just the union of a 
countable subcollection of these A,,’s. 

10. Let Y be a class of subsets of Q having the closure property (iii); 
let .°/ be a class of sets containing Q as well as 7%, and having the closure 
properties (vi) and (x). Then .c/ contains the B.F. generated by Y. (This is 
Dynkin’s form of a monotone class theorem which is expedient for certain 
applications. The proof proceeds as in Theorem 2.1.2 by replacing @ and G 
with and .*/ respectively.) 

11. Take Q = &" or a separable metric space in Exercise 10 and let 7 
be the class of all open sets. Let “ be a class of real-valued functions on Q 
satisfying the following conditions. 

(a) le AH andlpém foreachDeY; 

(b) M# is-a vector space, namely: if f; € %, fo € W and cj, cz are any 

two real constants, then c) fy +ce2.f2€ 1%; 
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(c) # is closed with respect to increasing limits of positive functions, 
namely: if fr € 4,0 < fn < fn4i for all n, and f =lim, ¢ f, < 
oo, then f € &. 


Then # contains all Borel measurable functions on Q, namely all finite- 
valued functions measurable with respect to the topological Borel field (= the 
minimal B.F. containing all open sets of Q). [HINT: let @ = {E C Q: lg € #%}; 
apply Exercise 10 to show that @ contains the B.F. just defined. Each positive 
Borel measurable function is the limit of an increasing sequence of simple 
(finitely-valued) functions.] 


12. Let @ be a MC. of subsets of 2” (or a separable metric space) 
containing all the open sets and closed sets. Prove that @ > Z” (the topological 
Borel field defined in Exercise 11). [Hmnt: Show that the minimal such class 
is a field.] 


2.2 Probability measures and their distribution 
functions 


Let 2 be a space, ¥ a BF. of subsets of Q. A probability measure Y(-) on F 
is a numerically valued set function with domain #, satisfying the following 
axioms: 


(i) VEEF : PA(E)>0. 
(ii) If {Ej} is a countable collection of (pairwise) disjoint sets in ¥, then 


J J 
(iii) AQ) = 1. 


The abbreviation “p.m.” will be used for “probability measure”. 
These axioms imply the following consequences, where all sets are 
members of ¥. 


(iv) PE) <1, 
(v) P(@) = 0. 
(vi) PUES) = 1— P(E). 
(vii) P(EUF)+ A(ENF)=P(E)+ AP). 
(viii) EC F > P(E) = PAF) — P(F\E) < PCF). 
(ix) Monotone property. E, + E or E, | E> P(En) > P(E). 
(x) Boole’s inequality. PU; E,)< F P(E ;). 
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Axiom (ii) is called “countable additivity”; the corresponding axiom 
restricted to a finite collection {E ;} is called “finite additivity”. 

The following proposition 
(1) En, => P(E,) > 0 


is called the “axiom of continuity”. It is a particular case of the monotone 
property (x) above, which may be deduced from it or proved in the same way 
as indicated below. 


Theorem 2.2.1. The axioms of finite additivity and of continuity together 
are equivalent to the axiom of countable additivity. 


PROOF. Let E,, |. We have the obvious identity: 


lo @) [e @) 
Ey = U (Ex\Exsi) U () Ex. 


If E, | @, the last term is the empty set. Hence if (ii) is assumed, we have 


io @) 
Yn > 1: A(E,) = So PE\ Ext); 
k=n 


the series being convergent, we have limy.5 A(E,) = 0. Hence (1) is true. 
Conversely, let {E,, k > 1} be pairwise disjoint, then 


[J Ele 


k=n+1 
(why?) and consequently, if (1) is true, then 
0° 
im. Pp (U es) 0. 


Now if finite additivity is assumed, we have 


oc n ore) 
PP (U rs P (U e:) +a( U rs) 
k=] k=] k=n+1 


Sen +7 UJ es) . 
k 


k=] =n+] 


This shows that the infinite series }~7° , Y(Ex) converges as it is bounded by 
the first member above. Letting n — oo, we obtain 
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x n ie,@) 
Pp (U E| = lim SE) + Jim. ( U & 
k=1 


k=] k=n-+1 


= S>P(Ex). 


k=1 


Hence (ii) is true. 


Remark. For a later application (Theorem 3.3.4) we note the following 
extension. Let 7 be defined on a field ¥ which is finitely additive and satis- 
fies axioms (i), (iii), and (1). Then (ii) holds whenever (J, Ex € #. For then 
Uren +1 Ek also belongs to ¥, and the second part of the proof above remains 
valid. 


The triple (Q,¥, 7) is called a probability space (triple), Q alone is 
called the sample space, and w is then a sample point. 

Let A C Q, then the trace of the B.F. ¥ on A is the collection of all sets 
of the form Af F, where F é€ ZF. It is easy to see that this is a B.F. of subsets 
of A, and we shall denote it by AMF. Suppose A € ¥ and P(A) > 0; then 
we may define the set function 7, on AN F as follows: 


P(E) 
P(A) 


VEE ANF: PAE) = 


It is easy to see that 7, is ap.m. on ANY. The triple (A, AN F, Pr) will 
be called the trace of (2, 4,7) on A. 


Example 1. Let Q be a countable set: 2 = {w,, j € J}, where J is a countable index 
set, and let ¥ be the total B.F. of &. Choose any sequence of numbers {pj, j € J} 
satisfying 


(2) Viet: p20 Yopp=hs 


jel 
and define a set function ¥ on ¥ as follows: 


(3) VE €F:P(E) = pj. 


wjeE 


In words, we assign p; as the value of the “probability” of the singleton {w,}, and 
for an arbitrary set of w;’s we assign as its probability the sum of all the probabilities 
assigned to its elements. Clearly axioms (i), (41), and (iii) are satisfied. Hence 7 so 
defined is a p.m. 

Conversely, let any such :7 be given on #. Since {w;} € F for every j, ’({a;}) 
is defined, let its value be p;. Then (2) is satisfied. We have thus exhibited all the 
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possible p.m.’s on &2, or rather on the pair (Q, “); this will be called a discrete sample 
space. The entire first volume of Feller’s well-known book [13] with all its rich content 
is based on just such spaces. 


Example 2. Let 7 = (0, 1], € the collection of intervals: 
€ ={(a,b):0<a<b<}}; 


% the minimal B.F. containing @,m the Borel—Lebesgue measure on &. Then 
(#, B,m) is a probability space. 

Let Ap be the collection of subsets of @ each of which is the union of a finite 
number of members of @. Thus a typical set B in Zp is of the form 


B=(J@. a) where a; < b) < a) < by < +++ <a, < dy. 
j=! 
It is easily seen that Ap is a field that is generated by & and in turn generates &%. 
If we take 7#/ = [0, 1] instead, then Zp is no longer a field since W@ ¢ Bo, but 
% and m may be defined as before. The new & is generated by the old @ and the 
singleton {0}. 


Example 3. Let #' = (—oo, +00), & the collection of intervals of the form (a, b]. 
—0 <a<b< +o. The field A generated by & consists of finite unions of disjoint 
sets of the form (a, b], (—0v, a] or (b, 00). The Euclidean B.F. Z! on &'! is the B.F. 
generated by @ or Ap. A set in #! will be called a (linear) Borel set when there is no 
danger of ambiguity. However, the Borel-Lebesgue measure m on &' is not a p.m.; 
indeed m(Z') = +00 so that m is not a finite measure but it is o-finite on @o, namely: 
there exists a sequence of sets E, € Zo, E, t Z' with m(E,,) < oo for each n. 


EXERCISES 


1. For any countably infinite set Q, the collection of its finite subsets 

and their complements forms a field 7. If we define A(E) on ¥ to be 0 or 1 

according as E is finite or not, then Y is finitely additive but not countably so. 

*2. Let Q be the space of natural numbers. For each E C Q let N,(E) 

be the cardinality of the set E 1 [0, n] and let @ be the collection of E’s for 
which the following limit exists: 


P(E) = lim Nat) 
noo n 
P is finitely additive on @ and is called the “asymptotic density” of E. Let E = 
{all odd integers}, F = {all odd integers in.[27",2?"+1] and all even integers 
in (22"+! 22"+2] for n > 0}. Show that E € @, F € &, but ENF ¢ &. Hence 
€ is not a field. 


ee 
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3. In the preceding example show that for each real number a@ in [0, 1] 
there is an E in © such that /(E) =a. Is the set of all primes in ¢ ? Give 
an example of E that is not in @. 

4. Prove the nonexistence of a p.m. on (2,./), where (Q,./) is as in 
Example 1, such that the probability of each singleton has the same value. 
Hence criticize a sentence such as: “Choose an integer at random”. 

5. Prove that the trace of a B.F. # on any subset A of Q is a BF. 
Prove that the trace of (2,4, PV) on any A in ¥ is a probability space, if 
P(A) > 0. 

*6. Now let A ¢ F be such that 


ACFEF 5S AF)=1. 


Such a set is called thick in(Q,¥, 7). IfE= ANF, F €F, define P*(E) = 
P(F). Then #* is a well-defined (what does it mean?) p.m. on (A, ANF). 
This procedure is called the adjunction of A to (Q,F,P). 

7. The B.F. Z! on &! is also generated by the class of all open intervals 
or all closed intervals, or all half-lines of the form (—o<, a] or (a, 00), or these 
intervals with rational endpoints. But it is not generated by all the singletons 
of &! nor by any finite collection of subsets of Z’. 

8. #' contains every singleton, countable set, open set, closed set, G5 
set, F, set. (For the last two kinds of sets see, e.g., Natanson [3].) 

*9, Let & be acountable collection of pairwise disjoint subsets {E ;, 7 > 1} 
of #!, and let F be the B.F. generated by @. Determine the most general 
p.m. on ¥ and show that the resulting probability space is “isomorphic” to 
that discussed in Example 1. 

10. Instead of requiring that the E;’s be pairwise disjoint, we may make 
the broader assumption that each of them intersects only a finite number in 
the collection. Carry through the rest of the problem. 


The question of probability measures on B is closely related to the 
theory of distribution functions studied in Chapter 1. There is in fact a one-to- 
one correspondence between the set functions on the one hand, and the point 
functions on the other. Both points of view are useful in probability theory. 
We establish first the easier half of this correspondence. 


Lemma. Each p.m. y on 4! determines a d.f. F through the correspondence 


(4) Wx € ®: w((—00, x]) = F(x). 
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AS a consequence, we have for —oo < a < b < +00: 
L((a, b]) = F(b) — F(a), 
U((a, b)) = F(b—) — F(a), 
U([a, b)) = F(b—) — F(a—), 
(la, b]) = F(b) — F(a—). 


(5) 


Furthermore, let D be any dense subset of &!, then the correspondence is 
already determined by that in (4) restricted to x € D, or by any of the four 
relations in (5) when a and b are both restricted to D. 


PROOF. Let us write 
Vx € A:1, = (—00, x]. 


Then J, € Z' so that w(/,) is defined; call it F(x) and so define the function 
F on &!. We shall show that F is a d.f. as defined in Chapter 1. First of all, F 
is increasing by property (viii) of the measure. Next, if x, | x, then /,, | J,, 
hence we have by (ix) 


(6) F(x) = WUs,) + Ux) = FO). 


Hence F is right continuous. [The reader should ascertain what changes should 
be made if we had defined F to be left continuous.] Similarly as x | —oo, I, | 
@;as x t +00,1, t #. Hence it follows from (ix) again that 


lim F@)= lim wy) = uw) = 0; 

x{—oo x{—oo 

lim F(x) = lim wy) = w(Q) = 1. 

xt+oo xt+00 
This ends the verification that F is a df. The relations in (5) follow easily 
from the following complement to (4): 

L((—00, x)) = F(x—). 
To see this let x, <x and x, t x. Since Jy, ¢ (—00o, x), we have by (ix): 
FQx-)= lim F@n) = K(—00, Xn)) t U0, x). 


To prove the last sentence in the theorem we show first that (4) restricted 
to x € D implies (4) unrestricted. For this purpose we note that uw((—o, x]), 
as well as F(x), is right continuous as a function of x, as shown in (6). 
Hence the two members of the equation in (4), being both right continuous 
functions of x and coinciding on a dense set, must coincide everywhere. Now 
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suppose, for example, the second relation in (5) holds for rational a and b. For 
each real x let a,, b, be rational such that a, | —oo and b, > x, b, | x. Then 
L((an, by) > U((—oo, x]) and F(b, —) — F(a,) — F(x). Hence (4) follows. 


Incidentally, the correspondence (4) “justifies” our previous assumption 
that F be right continuous, but what if we have assumed it to be left continuous? 
Now we proceed to the second-half of the correspondence. 


Theorem 2.2.2. Each d.f. F determines a p.m. on %! through any one of 
the relations given in (5), or alternatively through (4). 


This is the classical theory of Lebesgue-Stieltjes measure; see, e.g., 
Halmos [4] or Royden [5]. However, we shall sketch the basic ideas as an 
important review. The d.f. F being given, we may define a set function for 
intervals of the form (a, b] by means of the first relation in (5). Such a function 
is seen to be countably additive on its domain of definition. (What does this 
mean?) Now we proceed to extend its domain of definition while preserving 
this additivity. If S is a countable union of such intervals which are disjoint: 


S= UG, bi] 
we are forced to define y(S), if at all, by 


u(S) = >> w((ai, bil) = SO{F i) — Fi}. 


But a set S may be representable in the form above in different ways, so 
we must check that this definition leads to no contradiction: namely that it 
depends really only on the set S and not on the representation. Next, we notice 
that any open interval (a, b) is in the extended domain (why?) and indeed the 
extended definition agrees with the second relation in (5). Now it is well 
known that any open set U in &! is the union of a countable collection of 
disjoint open intervals [there is no exact analogue of this in 2” for n > 1], say 
U = ,(ci, dj); and this representation is unique. Hence again we are forced 
to define w(U), if at all, by 


wU) = 3° wei, di) = S (FG) — Fei}. 


Having thus defined the measure for all open sets, we find that its values for 
all closed sets are thereby also determined by property (vi) of a probability 
measure. In particular, its value for each singleton {a} is determined to be 
F(a) — F(a—), which is nothing but the jump of F at a. Now we also know its 
value on all countable sets, and so on —all this provided that no contradiction 
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is ever forced on us so far as we have gone. But even with the class of open 
and closed sets we are still far from the B.F. @'. The next step will be the G; 
sets and the F, sets, and there already the picture is not so clear. Although 
it has been shown to be possible to proceed this way by transfinite induction, 
this is a rather difficult task. There is a more efficient way to reach the goal 
via the notions of outer and inner measures as follows. For any subset S of 
%' consider the two numbers: 


w(S)= inf pV), 
U open, UDS 


lS y= sup «U(C). 
C closed, CCS 


LA* is the outer measure, 1, the inner measure (both with respect to the given 
F). It is clear that u*(S) > (8). Equality does not in general hold, but when 
it does, we call S “measurable” (with respect to F). In this case the common 
value will be denoted by u(S). This new definition requires us at once to 
check that it agrees with the old one for all the sets for which yw has already 
been defined. The next task is to prove that: (a) the class of all measurable 
sets forms a B.F., say 2; (b) on this 2, the function yz is a p.m. Details of 
these proofs are to be found in the references given above. To finish: since 
~ is a B.F., and it contains all intervals of the form (a, b], it contains the 
minimal B.F. Z! with this property. It may be larger than #!, indeed it is 
(see below), but this causes no harm, for the restriction of uw to #! is a p.m. 
whose existence is asserted in Theorem 2.2.2. 

Let us mention that the introduction of both the outer and inner measures 
is useful for approximations. It follows, for example, that for each measurable 
set S and € > 0, there exists an open set U and a closed set C such that 
UDS2DC and 


(7) u(U)—-€ <uS) SUC) +e. 


There is an alternative way of defining measurability through the use of the 
outer measure alone and based on Carathéodory’s criterion. 

It should also be remarked that the construction described above for 
(R!, A', w) is that of a “topological measure space”, where the B.F. is gener- 
ated by the open sets of a given topology on &!, here the usual Euclidean 
one. In the general case of an “algebraic measure space”, in which there is no 
topological structure, the role of the open sets is taken by an arbitrary field &, 
and a measure given on “ may be extended to the minimal B.F. ¥ containing 
A in a similar way. In the case of #', such an & is given by the field Bp of 
sets, each of which is the union of a finite number of intervals of the form (a, 
b], (—o0, b], or (a, 00), where ae &', b € &'. Indeed the definition of the 
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outer measure given above may be replaced by the equivalent one: 


(8) B*(E) = inf} wn). 


where the infimum is taken over all countable unions ),,U, such that each 
U, € Ao and U,, Un D E. For another case where such a construction is 
required see Sec. 3.3 below. 

There is one more question: besides the yz discussed above is there any 
other p.m. v that corresponds to the given F in the same way? It is important 
to realize that this question is not answered by the preceding theorem. It is 
also worthwhile to remark that any p.m. v that is defined on a domain strictly 
containing @! and that coincides with ~ on #' (such as the pw on 2 as 
mentioned above) will certainly correspond to F in the same way, and strictly 
speaking such a v is to be considered as distinct from jz. Hence we should 
phrase the question more precisely by considering only p.m.’s on ZB. This 
will be answered in full generality by the next theorem. 


Theorem 2.2.3. Let jz and v be two measures defined on the same B.F. ¥, 
which is generated by the field . If either or v is o-finite on “, and 
u(E) = v(E) for every E € &, then the same is true for every E ¢ ¥, and 
thus w= v. 


PROOF. We give the proof only in the case where yz and v are both finite, 
leaving the rest as an exercise. Let 


€={E€¢F:p(E)=v)}, 


then ¢ D & by hypothesis. But & is also a monotone class, for if E, € @ for 
every n and E, t E or E, | E, then by the monotone property of y and v, 
respectively, 

UE) = lim L(E,) = lim v(E,,) = v(E). 


It follows from Theorem 2.1.2 that @ > ¥, which proves the theorem. 


Remark. In order that jz and v coincide on A, it is sufficient that they 
coincide on a collection & such that finite disjoint unions of members of 4 
constitute “. 


Corollary. Let yz and v be o-finite measures on #} that agree on all intervals 
of one of the eight kinds: (a, b], (a, b), [a, b), [a, b], (00, b], (—0%, b), [a, oo), 
(a, 00) or merely on those with the endpoints in a given dense set D, then 
they agree on #?. 
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PROOF. In order to apply the theorem, we must verify that any of the 
hypotheses implies that 2 and v agree on a field that generates Z. Let us take 
intervals of the first kind and consider the field “Zp defined above. If uw and v 
agree on such intervals, they must agree on Ap by countable additivity. This 
finishes the proof. 


Returning to Theorems 2.2.1 and 2.2.2, we can now add the following 
complement, of which the first part is trivial. 


Theorem 2.2.4. Given the p.m. u on #', there is a unique df. F satisfying 
(4). Conversely, given the d.f. F, there is a unique p.m. satisfying (4) or any 
of the relations in (5). 


We shall simply call yz the p.m. of F, and F the df. of w. 
Instead of (#!, A!) we may consider its restriction to a fixed interval [a, 
b]. Without loss of generality we may suppose this to be 7 = [0, 1] so that we 
are in the situation of Example 2. We can either proceed analogously or reduce 
it to the case just discussed, as follows. Let F be a d.f. such that F = 0 for x < 
0 and F = 1 for x > 1. The probability measure yu of F will then have support 
in [O, 1], since w((—oo, 0)) = 0 = w((1, c©)) as a consequence of (4). Thus 
the trace of (#!, H', w) on W may be denoted simply by (W, Z, w), where 
BZ is the trace of 4! on %. Conversely, any p.m. on & may be regarded as 
such a trace. The most interesting case is when F is the “uniform distribution” 
on 7%: 
0 forx <0, 
raya 4s forO<x <1, 
1 for x > 1. 
The corresponding measure m on # is the usual Borel measure on [0, 1], 
while its extension on / as described in Theorem 2.2.2 is the usual Lebesgue 
measure there. It is well known that .z is actually larger than 4; indeed (2, m) 
is the completion of (4, m) to be discussed below. 


DEFINITION. The probability space (Q, 4,7) is said to be complete iff 
any subset of a set in .* with PF) = 0 also belongs to F. 


Any probability space (Q, 4%, /) can be completed according to the next 
theorem. Let us call a set in * with probability zero a null set. A property 
that holds except on a null set is said to hold almost everywhere (a.e.), almost 
surely (a.s.), or for almost every w. 


Theorem 2.2.5. Given the probability space (Q2,:7,/), there exists a 
complete space (Q,.%,./) such that-¥ CF and P=PYonF. 
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_PROOF. Let .{° be the collection of sets that are subsets of null sets, and 
let ¥ be the collection of subsets of Q each of which differs from a set in ¥ 
by a subset of a null set. Precisely: 


(9) F={ECQ:EsAFEN — for some F €F}. 


It is easy to verify, using Exercise 1 of Sec. 2.1, that F is a BF. Clearly it 
contains ¥. For each E € ¥, we put 


P(E) = PF), 


where F is any set that satisfies the condition indicated in (7). To show that 
this definition does not depend on the choice of such an F, suppose that 


EAF,€-N%, EaF2€EN. 
Then by Exercise 2 of Sec. 2.1, 
(EA F,) ACE AF2) = (Fi AF2) ACE AE) = Fi a Fp. 


Hence F, AF, € 4° and so P(F, A F2) = 0. This implies A(F,) = P(F2), 
as was to be shown. We leave it as an exercise to show that Y is a measure 
on ¥. If E €F, thn EAE =@ €-N, hence A(E) = P(E). 

Finally, it is easy to verify that if E ¢ % and A(E) = 0, then E €. 
Hence any subset of E also belongs to .4” and so to F. This proves that 
(2, F, PY) is complete. 


What is the advantage of completion? Suppose that a certain property, 
such as the existence of a certain limit, is known to hold outside a certain set 
N with A(N) = 0. Then the exact set on which it fails to hold is a subset 
of N, not necessarily in ¥, but will be in F with A(N) = 0. We need the 
measurability of the exact exceptional set to facilitate certain dispositions, such 
as defining or redefining a function on it; see Exercise 25 below. 


EXERCISES 


In the following, jz is a p.m. on Z' and F is its df. 

*11. An atom of any measure w on %! is a singleton {x} such that 
y({x}) > 0. The number of atoms of any o-finite measure is countable. For 
each x we have u({x}) = F(x) — F(x). 

12. y is called atomic iff its value is zero on any set not containing any 
atom. This is the case if and only if F is discrete. 2 is without any atom or 
atomless if and only if F. is continuous. 

13. yx is called singular iff there exists a set Z with m(Z) =0 such 
that (Z°) = 0. This is the case if and only if F is singular. [HinT: One half 
is proved by using Theorems 1.3.1 and 2.1.2 to get is F'(x) dx < y(B) for 
B € #'; the other requires Vitali’s covering theorem.] 
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14. Translate Theorem 1.3.2 in terms of measures. 

*15. Translate the construction of a singular continuous d.f. in Sec. 1.3 in 
terms of measures. [It becomes clearer and easier to describe!] Generalize the 
construction by replacing the Cantor set with any perfect set of Borel measure 
zero. What if the latter has positive measure? Describe a probability scheme 
to realize this. 

16. Show by a trivial example that Theorem 2.2.3 becomes false if the 
field “@ is replaced by an arbitrary collection that generates 7. 

17. Show that Theorem 2.2.3 may be false for measures that are o-finite 
on #. [HINT: Take Q to be {1, 2,..., oo} and & to be the finite sets excluding 
oo and their complements, 4(£) = number of points in E, (0) 4 v(00).] 

18. Show that the ¥ in (9) | is also the collection of sets of the form 
FUN [or F\N] where F € ¥ andN € 1. 

19. Let .1” be as in the proof of Theorem 2.2.5 and 49 be the set of all 
null sets in (2, 4, A). Then both these collections are monotone classes, and 
closed with respect to the operation “\”’. 

*20. Let (2, 4,2) be a probability space and A a Borel subfield of 
#. Prove that there exists a minimal B.F. & satisfying A CA CF and 
_16 C A, where —/9 is as in Exercise 19. A set E belongs to & if and only 
if there exists a set F in A such that EAF €.49. This & is called the 
augmentation of A with respect to (2,4, FY). 

21. Suppose that F has all the defining properties of a d.f. except that it 
is not assumed to be right continuous. Show that Theorem 2.2.2 and Lemma 
remain valid with F replaced by F, provided that we replace F(x), F(b), F(a) 
in (4) and (5) by F (x+), F(b+), F(a+), respectively. What modification is 
necessary in Theorem 2.2.4? 

22. For an arbitrary measure 7 on a B.F. ¥, a set E in F is called an 
atom of 7 iff W(E) > Oand F CE, F € ¥ imply AF) = AE) or A(F) = 
0. f is called atomic iff its value is zero over any set in ¥ that is disjoint 
from all the atoms. Prove that for a measure on #! this new definition is 
equivalent to that given in Exercise 11 above provided we identify two sets 
which differ by a /-null set. 

23. Prove that if the p.m. 7 is atomless, then given any a@ in [0, 1] 
there exists a set E € ¥ with A(E) =a. [HINT: Prove first that there exists E 
with “arbitrarily small” probability. A quick proof then follows from Zorn’s 
lemma by considering a maximal collection of disjoint sets, the sum of whose 
probabilities does not exceed a. But an elementary proof without using any 
maximality principle is also possible.] 

*24. A point x is said to be in the support of a measure « on 4” iff 
every open neighborhood of x has strictly positive measure. The set of all 
such points is called the support of . Prove that the support is a closed set 
whose complement is the maximal open set on which yw vanishes. Show that 
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the support of a p.m. on #! is the same as that of its d.f., defined in Exercise 6 
of Sec. 1.2. 


*25. Let f be measurable with respect to ¥, and Z be contained in a null 


set. Define 
f _If on Z°, 
~ 1K on Z, 


where K is a constant. Then f is measurable with respect to ¥ provided that 
(2,4, FY) is complete. Show that the conclusion may be false otherwise. 


3 Random variable. 
Expectation. Independence 


3.1 General definitions 


Let the probability space (2,47, P) be given. AR! = (—oo, +00) the (finite) 
real line, 2* = [—oo, +00] the extended real line, 4! = the Euclidean Borel 
field on #', A* = the extended Borel field. A set in #* is just a set in B 
possibly enlarged by one or both points too. 


DEFINITION OF A RANDOM VARIABLE. A real, extended-valued random vari- 
able is a function X whose domain is a set A in ¥ and whose range is 
contained in #* = [—oo, +00] such that for each B in A*, we have 


(1) {w:X(w) Ee B}E ANF 


where A * is the trace of # on A. A complex-valued random variable is 
a function on a set A in ¥ to the complex plane whose real and imaginary 
parts are both real, finite-valued random variables. 


This definition in its generality is necessary for logical reasons in many 
applications, but for a discussion of basic properties we may suppose A = (2 
and that X is real and finite-valued with probability one. This restricted meaning 
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of a “random variable”, abbreviated as “tr.v.”, will be understood in the book 
unless otherwise specified. The general case may be reduced to this one by 
considering the trace of (Q,4%,7) on A, or on the “domain of finiteness” 
Ao = {w: |X(@)| < oo}, and taking real and imaginary parts. 
Consider the “inverse mapping” X~! from &! to Q, defined (as usual) 
as follows: 
VA C R:X1(A) = {w: X(w) € A}. 


Condition (1) then states that X~! carries members of Z! onto members of F: 
(2) VBE BX (BEF; 
or in the briefest notation: 

XB) CF. 


Such a function is said to be measurable (with respect to F). Thus, an r.v. is 
just a measurable function from Q to %! (or Z*). 
The next proposition, a standard exercise on inverse mapping, is essential. 


Theorem 3.1.1. For any function X from Q to &! (or R*), not necessarily 
an r.v., the inverse mapping X~! has the following properties: 


X71(A°) = (X71(A))*. 
x7! (Us) = LJx7 Aa), 
x7! (4) = (\X7! (Aa). 


where @ ranges over an arbitrary index set, not necessarily countable. 


Theorem 3.1.2. X is an r.v. if and only if for each real number x, or each 
real number x in a dense subset of &!, we have 


{w:X(w) <x} EF. 
PROOF. The preceding condition may be written as 
(3) Vx: X7!((—00, x]) € F. 


Consider the collection .°/ of all subsets S of &! for which X~'!(S) € F. From 
Theorem 3.1.1 and the defining properties of the Borel field 7, it follows that 
if S € <7, then 

X~1(S*) = XMS) € F; 
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if Vj: S; €.°/, then 


X11 Ls; ) =x7G,) € F. 
j j 


Thus S° € .Y and (J j 5; € and consequently .°Y is a B.F. This B.F. contains 
all intervals of the form (—oo, x], which generate #! even if x is restricted to 
a dense set, hence .c/ > #!, which means that X~!(B) € F for each B € #!. 
Thus X is an r.v. by definition. This proves the “if’ part of the theorem; the 
“only if” part is trivial. 

Since /(-) is defined on ¥, the probability of the set in (1) is defined 
and will be written as 


P{X(w) € BY or PX € B). 


The next theorem relates the p.m. f to a p.m. on (&!, #!) as discussed 
in Sec. 2.2. 


Theorem 3.1.3. Each r.v. on the probability space (Q,4,P) induces a 
probability space (2!, Z', w) by means of the following correspondence: 


(4) VBE F': p(B) = A{X!(B)) = AX € B}. 


PROOF. Clearly 4(B) > 0. If the B,’s are disjoint sets in %', then the 
X~1(B,)’s are disjoint by Theorem 3.1.1. Hence 


(Us) --(e-(Ua))--(yr) 


= S° P(X "(Bn)) = 5 uBn). 


Finally X71(Z!) = Q, hence u(#!) = 1. Thus yz is a p.m. 


The collection of sets {X~!(S), S C #'} is a B.F. for any function X. If 
X is ar.v. then the collection {X—!(B), B € #'} is called the B.F. generated 
by X. It is the smallest Borel subfield of * which contains all sets of the form 
{w: X(w) <x}, where x € %'. Thus (4) is a convenient way of representing 
the measure / when it is restricted to this subfield; symbolically we may 


write it as follows: 
w= Pox, 


This py is called the “probability distribution measure” or p.m. of X, and its 
associated d.f. F according to Theorem 2.2.4 will be called the d.f. of X. 


rt 
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Specifically, F is given by 
F(x) = w((—00, x]) = AX < x}. 


While the r.v. X determines w and therefore F, the converse is obviously 
false. A family of r.v.’s having the same distribution is said to be “identically 
distributed”. 


Example 1. Let (22,./) be a discrete sample space (see Example 1 of Sec. 2.2). 
Every numerically valued function is an r.v. 


Example 2. (@, &, m). 

In this case an r.v. is by definition just a Borel measurable function. According to 
the usual definition, f on % is Borel measurable iff f—!(#') C &%. In particular, the 
function f given by f(w) =w@ is anr.v. The two r.v.’s w and 1 — w are not identical 
but are identically distributed; in fact their common distribution is the underlying 
measure m. 


Example 3. (&', Z', wu). 

The definition of a Borel measurable function is not affected, since no measure 
is involved; so any such function is an r.v., whatever the given p.m. wz may be. As 
in Example 2, there exists an r.v. with the underlying yw as its p.m.; see Exercise 3 
below. 


We proceed to produce new r.v.’s from given ones. 


Theorem 3.1.4. If X is anr.v., f a Borel measurable function [on (Z!, B')), 
then f(X) is an r.v. 


PROOF. The quickest proof is as follows. Regarding the function f (X) of 
@ as the “composite mapping”: 


feXia—> f(X@)), 
we have (f °X)~! = X7! © f~! and consequently 
(f° X) 1B) = X11) C XB) CF. 


The reader who is not familiar with operations of this kind is advised to spell 
out the proof above in the old-fashioned manner, which takes only a little 
longer. 


We must now discuss the notion of a random vector. This is just a vector 
each of whose components is an r.v. It is sufficient to consider the case of two 
dimensions, since there is no essential difference in higher dimensions apart 
from complication in notation. 
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We recall first that in the 2-dimensional Euclidean space 7°, or the plane, 
the Euclidean Borel field 4 is generated by rectangles of the form 


{((%, yia<x<b,c<y<d}. 
A fortiori, it is also generated by product sets of the form 
By, x By = {(x, y):x € By, y € By}, 


where B, and B> belong to #'. The collection of sets, each of which is a finite 
union of disjoint product sets, forms a field #5. A function from Z? into Z! 
is called a Borel measurable function (of two variables) iff f~'(Z!) c #. 
Written out, this says that for each 1-dimensional Borel set B, viz., a member 
of @!, the set 

{(x, y): f &, y) € B} 


is a 2-dimensional Borel set, viz. a member of Z*. 
Now let X and Y be two r.v.’s on (Q, ¥, A). The random vector (X, Y) 
induces a probability v on Z as follows: 


(5) VA € B*: (A) = Pl(X, Y) € A}, 


the right side being an abbreviation of A({w: (X(w), Y(w)) € A}). This v 
is called the (2-dimensional, probability) distribution or simply the p.m. of 
(X, Y). 

Let us also define, in imitation of X¥~!, the inverse mapping (X, Y)~! by 
the following formula: 


WA € B*:(X, Y)71(A) = {w: (X, Y) € A}. 


This mapping has properties analogous to those of X~' given in 
Theorem 3.1.1, since the latter is actually true for a mapping of any two 
abstract spaces. We can now easily generalize Theorem 3.1.4. 


Theorem 3.1.5. If X and Y are r.v.’s and f is a Borel measurable function 
of two variables, then f(X, Y) is an r.v. 
PROOF. 


[f° (X, YB) = (Xv fB) Cc KYB) CF. 


The last inclusion says the inverse mapping (X,¥)~! carries each 2- 
dimensional Borel set into a set in *%. This is proved as follows. If A = 
B, x Bo, where B; € #', By € FH’, then it is clear that 


(X,Y) (A) = X71 (B,) NY (B.) EF 
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by (2). Now the collection of sets A in 2? for which (X, Y)~!(A) € F forms 
a B.F. by the analogue of Theorem 3.1.1. It follows from what has just been 
shown that this B.F. contains Z hence it must also contain 4”. Hence each 
set in A* belongs to the collection, as was to be proved. 


Here are some important special cases of Theorems 3.1.4 and 3.1.5. 
Throughout the book we shall use the notation for numbers as well as func- 
tions: 


(6) xVy=max(x,y), xAy=min(z, y). 


Corollary. If X is anr.v. and f is a continuous function on #!, then f (X) 
is an r.v.; in particular X” for positive integer r, |X|" for positive real r, e~*, 
e'* for real A and f, are all r.v.’s (the last being complex-valued). If X and Y 


are r.v.’s, then 
XVY, XAY, X+Y, X-Y, X-Y, X/Y 


are r.v.’s, the last provided Y does not vanish. 


Generalization to a finite number of r.v.’s is immediate. Passing to an 
infinite sequence, let us state the following theorem, although its analogue in 
real function theory should be well known to the reader. 


Theorem 3.1.6. If {X;, j = 1} is a sequence of r.v.’s, then 


infX;, supX,;, liminfX;, limsupX; 
J j j J 


are r.v.’s, not necessarily finite-valued with probability one though everywhere 
defined, and 
lim X j 


Jrow 


is an r.v. on the set A on which there is either convergence or divergence to 
aE OO:; 


PROOF. To see, for example, that sup j Xj; is an r.v., we need only observe 
the relation 
Vx € R: {sup X ; x= (x; < x} 
j j 
and use Theorem 3.1.2. Since 


lim sup X; = inf(sup X ;), 
j 


n j>n 
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and lim jo. X ; exists [and is finite] on the set where lim sup jaj= lim inf jai 
[and is finite], which belongs to 7, the rest follows. 


Here already we see the necessity of the general definition of an r.v. 
given at the beginning of this section. 


DEFINITION. An r.v. X is called discrete (or countably valued) iff there 
is a countable set B C #! such that A(X € B) = 1. 


It is easy to see that X is discrete if and only if its df. is. Perhaps it is 
worthwhile to point out that a discrete r.v. need not have a range that is discrete 
in the sense of Euclidean topology, even apart from a set of probability zero. 
Consider, for example, an r.v. with the d.f. in Example 2 of Sec. 1.1. 

The following terminology and notation will be used throughout the book 
for an arbitrary set (2, not necessarily the sample space. 


DEFINITION. For each A C Q, the function 1,(-) defined as follows: 


1, ifwe A, 


Yo eM 1Q0) = 4 4 if we Q\A 


is called the indicator (function) of A. 


Clearly 1, is anr.v. if and only if A € F. 
A countable partition of Q is a countable family of disjoint sets {A ;}, 
with A; € # for each j and such that Q = [J j Aj. We have then 


l=lp=S "Iq, 
J 


More generally, let b; be arbitrary real numbers, then the function g defined 
below: 


Vo € 2: 9(w) = S° bla, (@), 
J 


is a discrete r.v. We shall call g the r.v. belonging to the weighted partition 
{A ;;6;}. Each discrete r.v. X belongs to a certain partition. For let {b;} be 
the countable set in the definition of X and let Aj = {w:X(w) =b j}, then X 
belongs to the weighted partition {A ;;b,}. If j ranges over a finite index set, 
the partition is called finite and the r.v. belonging to it simple. 


EXERCISES 


1. Prove Theorem 3.1.1. For the “direct mapping” X, which of these 
properties of X~! holds? 
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2. If two r.v.’s are equal a.e., then they have the same p.m. 
*3. Given any p.m. uw on (#!, #!), define an r.v. whose p.m. is w. Can 
this be done in an arbitrary probability space? 
*4, Let @ be uniformly distributed on [0,1]. For each d.f. F, define G(y) = 
sup{x: F(x) < y}. Then G(6) has the df. F. 
*5, Suppose X has the continuous d.f. F, then F(X) has the uniform 
distribution on [0,1]. What if F is not continuous? 


6. Is the range of an r.v. necessarily Borel or Lebesgue measurable? 

7. The sum, difference, product, or quotient (denominator nonvanishing) 
of the two discrete r.v.’s is discrete. 

8. If Q is discrete (countable), then every r.v. is discrete. Conversely, 
every r.v. in a probability space is discrete if and only if the p.m. is atomic. 
[HINT: Use Exercise 23 of Sec. 2.2.] 

9. If f is Borel measurable, and X and Y are identically distributed, then 
so are f(X) and f(Y). 

10. Express the indicators of A; UA2, A; MA2, Ai\A2, Ai A Az, 
lim sup A,,, liminf A, in terms of those of A;, Az, or A,. [For the definitions 
of the limits see Sec. 4.2.] 

*11. Let #{X} be the minimal B.F. with respect to which X is measurable. 
Show that A € ¥{X} if and only if A = X~!(B) for some B € &'. Is this B 
unique? Can there be a set A ¢ ZB such that A = X7!(A)? 

12. Generalize the assertion in Exercise 11 to a finite set of r.v.’s. [It is 
possible to generalize even to an arbitrary set of r.v.’s.] 


3.2 Properties of mathematical expectation 


The concept of “(mathematical) expectation” is the same as that of integration 
in the probability space with respect to the measure 7. The reader is supposed 
to have some acquaintance with this, at least in the particular case (7, ZB, m) 
or (%', A', m). [In the latter case, the measure not being finite, the theory 
of integration is slightly more complicated.] The general theory is not much 
different and will be briefly reviewed. The r.v.’s below will be tacitly assumed 
to be finite everywhere to avoid trivial complications. 

For each positive discrete r.v. X belonging to the weighted partition 
{A ;;b;}, we define its expectation to be 


(1) E(X) = So dbjP{Aj}. 
J 


This is either a positive finite number or +oo. It is trivial that if X belongs 
to different partitions, the corresponding values given by (1) agree. Now let 
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X be an arbitrary positive r.v. For any two positive integers m and n, the set 


FN oe < X(w) < Maa 
2m on 


belongs to #. For each m, let X,, denote the r.v. belonging to the weighted 
partition {Aimn3;m/2”}; thus X,,=n/2” if and only if n/2"<X <(n+ 
1)/2™. It is easy to see that we have for each m: 


1 
Vo: Xm(w) < Xm41(@); 0 < X(w) — Xm(o) < ra 


Consequently there is monotone convergence: 


Vo: lim Xm(@) = X(@). 
m->OO 


The expectation X,, has just been defined; it is 
io. @) 
‘ n n n+1 


If for one value of m we have &(X,,) = +00, then we define &(X) = +00; 
otherwise we define 


&(X) = lim &(Xm), 
m— OO 


the limit existing, finite or infinite, since &(X,,) is an increasing sequence of 
real numbers. It should be shown that when X is discrete, this new definition 
agrees with the previous one. 

For an arbitrary X, put as usual 


(2) X =Xt—xX- where Xt=Xv0, X~ =(-X)VO. 


Both X* and X~ are positive r.v.’s, and so their expectations are defined. 
Unless both (Xt) and &(X7~) are +00, we define 


(3) &(X) = &(Xt) — EX) 


with the usual convention regarding oo. We say X has a finite or infinite 
expectation (or expected value) according as ¢(X) is a finite number or oo. 
In the expected case we shall say that the expectation of X does not exist. The 
expectation, when it exists, is also denoted by 


/ X(w)P(dw). 
Q 
More generally, for each A in 4%, we define 


(4) / X(w)P(dw) = &(X - 14) 
A 
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and call it “the integral of X (with respect to 7) over the set A”. We shall 
say that X is integrable with respect to 7 over A iff the integral above exists 
and is finite. 

In the case of (Z', Z', w), if we write X = f, w =x, the integral 


/ X(w)P(do) = / F(x)u(dx) 
A A 


is just the ordinary Lebesgue—Stieltjes integral of f with respect to w. If F 
is the d.f. of ~% and A = (a, b], this is also written as 


f(x) dF (2). 
(a,b] 


This classical notation is really an anachronism, originated in the days when a 
point function was more popular than a set function. The notation above must 


then amend to 
b+0 pb+0 pb-0 = pb—-0 
oat 
a+0 a—0 a+0 a—0 


to distinguish clearly between the four kinds of intervals (a, b], [a, b], (a, b), 
[a, b). 

In the case of (7/, B, m), the integral reduces to the ordinary Lebesgue 
integral 


b b 
i f &)m(dx) = / f (x) dx. 


Here m is atomless, so the notation is adequate and there is no need to distin- 
guish between the different kinds of intervals. 

The general integral has the familiar properties of the Lebesgue integral 
on [0,1]. We list a few below for ready reference, some being easy conse- 
quences of others. As a general notation, the left member of (4) will be 
abbreviated to f 44 4. In the following, X, Y are r.v.’s; a, b are constants; 
A is a set in ¥. 


(i) Absolute integrability. [, X dP is finite if and only if 
i IX|dP < 00. 
A 
(ii) Linearity. 
[oxsorar=a | xdr+b | Yd? 
A A A 


provided that the right side is meaningful, namely not +oo — oo or —00 + 0. 
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(iii) Additivity over sets. If the A,’s are disjoint, then 


/ XdP=S~ / X dP. 
nAn n n 
(iv) Positivity. If X > 0 a.e. on A, then 
/ Xdf => 0. 
A 
(v) Monotonicity. If X; < X < X2 a.e. on A, then 
[sae [ xan < / X2dPf. 
A A A 
(vi) Mean value theorem. If a < X <b a.e. on A, then 
aP(A) < / X dP < bY(A). 
A 
(vii) Modulus inequality. 


[ xaa] < | IX| dP. 
A A 


(viii) Dominated convergence theorem. Tf limy—+oo Xn = X a.e. or merely 
in measure on A and Vn: |X,| < Y ae. on A, with f, YdP < oo, then 


(5) lim | X,dP= / XdP= lim X, df. 
nwo A A A noo 

(ix) Bounded convergence theorem. If limy-,. Xp, = X a.e. or merely 
in measure on A and there exists a constant M such that Vn: |X,| <M ae. 
on A, then (5) is true. 

(x) Monotone convergence theorem. If X, => 0 and X, + X ae. on A, 
then (5) is again true provided that +oo is allowed as a value for either 
member. The condition “X, > 0” may be weakened to: “&(X,,) > —oo for 
some n”’. 

(xi) Integration term by term. If 


>/ IX,|dP <0, 
n vA 


then >>, |[Xn| < 00 a.e. on A so that }°, X, converges a.e. on A and 


| Sar =d | xan, 
AS n vA 
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(xii) Fatou’s lemma. If X, > 0 a.e. on A, then 
[ci x ar < tim | x, a7. 
A n>oo n>OOo/UA 


Let us prove the following useful theorem as an instructive example. 


Theorem 3.2.1. We have 


(6) DL AUX| = 2) < E(X|) S 1+ AUX 2 1) 


n=1 n=1 
so that &(|X|) < oo if and only if the series above converges. 


PROOF. By the additivity property (iii), if A, = {n < |X| <n+ 1}, 


lo. 2) 
euxp= 0 [ xiav. 
n=0% An 


Hence by the mean value theorem (vi) applied to each set A,: 


(7) So nP(An) < EUX) < So + DA(An) = 14+ nP(Ay). 


n=0 n=0 n=0 


It remains to show 


(8) Yo nP(An) = >> AUX = 2), 


n=0 n=0 


finite or infinite. Now the partial sums of the series on the left may be rear- 
ranged (Abel’s method of partial summation!) to yield, for N > 1, 
N 
(9) So n{PUX| =n) — PUX| =n +1} 
n=0 
N 
= Soin = — IJPUX| =n) -NAIX|2N +1) 
n=1 
N 
=) P(\X| =n) -NP(X| = N +1). 
n=] 


Thus we have 


N N N 
(10) SonP(A,) < S$) PIX >= 1) < So nP(An) +NAIX| > N +0). 


n=1 n=] n=1 


46 | RANDOM VARIABLE. EXPECTATION. INDEPENDENCE 


Another application of the mean value theorem gives 
wriKleN+i)s | |X| dP. 
(IX|2N+1) 


Hence if &(|X|) < ox, then the last term in (10) converges to zero as N — 00 
and (8) follows with both sides finite. On the other hand, if @(|X|) = o«, then 
the second term in (10) diverges with the first as N — ov, so that (8) is also 
true with both sides infinite. 


Corollary. If X takes only positive integer values, then 


&(X) = SAK >n). 


n=1 
EXERCISES 


1. If X >Oae.on A and [, XdP =0, then X = 0 ae. on A. 
*2. If €(|X}) < 00 and limy_5o0 A(An) = 0, then limy_,o0 ti XdP =0. 
In particular 


lim Xdf=0. 


NOS SX |>n} 


3. Let X > 0 and [,XdP=A,0<A < oo. Then the set function v 
defined on F as follows: 


] 
wiay=> | xae, 
A Ja 


is a probability measure on #. 
4. Let c be a fixed constant, c > 0. Then &(|X|) < 06 if and only if 


x 
S > P(IX| = en) < 0. 


n=l 


In particular, if the last series converges for one value of c, it converges for 
all values of c. 


5. For any r > 0, &(|X|') < oo if and only if 


CO 
So nt PX| >n) < 00. 
n=] 


ee A LT LL TC ETT, 
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*6. Suppose that sup, |X,| < Y on A with f al dP < ow. Deduce from 
Fatou’s lemma: 


[im X,)dP > Tim [ xnae. 
A noo A 


noo 


Show this is false if the condition involving Y is omitted. 


*7. Given the r.v. X with finite “(X), and € > 0, there exists a simple r.v. 
X, (see the end of Sec. 3.1) such that 


& (|X — X,|) <€. 
Hence there exists a sequence of simple r.v.’s X,, such that 


lim &(|X —X |) = 0. 
m—> OO 


We can choose {X,,} so that |Xj,| < |X| for all m. 


*8. For any two sets A; and A2 in ¥, define 
P(Ay, Az) = P(A, 4 Ad); 


then ¢ is a pseudo-metric in the space of sets in ¥; call the resulting metric 
space M(#, #). Prove that for each integrable r.v. X the mapping of M(7, 7) 
to R! given by A> f ,X 4 is continuous. Similarly, the mappings on 
M(%, 7) x M(#, 7) to M(Z, 7) given by 


(Ay, Az) > Ai U Ag, Ay Ao, Ay\Ao, Ay 4 Ag 
are all continuous. If (see Sec. 4.2 below) 


lim sup A, = liminf A, 
n n 

modulo a null set, we denote the common equivalence class of these two sets 
by lim, A,,. Prove that in this case {A,,} converges to lim, A, in the metric 
p. Deduce Exercise 2 above as a special case. 

There is a basic relation between the abstract integral with respect to 
f over sets in * on the one hand, and the Lebesgue—Stieltjes integral with 
respect to yz over sets in :4' on the other, induced by each r.v. We give the 
version in one dimension first. 


Theorem 3.2.2. Let X on (Q2,.%,/) induce the probability space 


(%', B', w) according to Theorem 3.1.3 and let f be Borel measurable. Then 
we have 


(11) | f (X(@)) (dw) = F(x)u(dx) 
Q Ri 


provided that either side exists. 
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proor. Let B € .%4', and f = 1p, then the left side in (11) is A(X € B) 
and the right side is 4(B). They are equal by the definition of u in (4) of 
Sec. 3.1. Now by the linearity of both integrals in (11), it will also hold if 


(12) f= > bdylz,, 
J 


namely the r.v. on (#!, 4, w) belonging to an arbitrary weighted partition 
{B;; bj}. For an arbitrary positive Borel measurable function f we can define, 
as in the discussion of the abstract integral given above, a sequence {f,, m > 
1} of the form (12) such that f,, + f everywhere. For each of them we have 


(13) | fn X) AP = 7 fod 
Q Ri 


hence, letting m— oo and using the monotone convergence theorem, we 
obtain (11) whether the limits are finite or not. This proves the theorem for 
f = 0, and the general case follows in the usual way. 


We shall need the generalization of the preceding theorem in several 
dimensions. No change is necessary except for notation, which we will give 
in two dimensions. Instead of the v in (5) of Sec. 3.1, let us write the “mass 
element” as y.7(dx, dy) so that 


iy = ih : y2(dx, dy). 
A 


Theorem 3.2.3. Let (X, Y) on (Q,4%,2#) induce the probability space 
(FR, FB, 7) and let f be a Borel measurable function of two variables. 
Then we have 


(14) [ f(X(), Yo) Pde) = / F(x, yu2(dx, dy). 


St 
R- 


Note that f(X, Y) is an r.v. by Theorem 3.1.5. 


As a consequence of Theorem 3.2.2, we have: if wx and Fy denote, 
respectively, the p.m. and d.f. induced by X, then we have 


(x)= f sux(axy = f xdF x(x); 


and more generally 


(15) cpa = | foouxtdx) = [fo dFxc) 


with the usual proviso regarding existence and finiteness. 
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Another important application is as follows: let 2 be as in Theorem 3.2.3 
and take f(x, y) to be x + y there. We obtain 


(16) EX+Y)= / (x + yyu2(dx, dy) 


RR 
_ / / xp2(dx, dy) + / / yp2(dx, dy). 
R ° 


RY 


On the other hand, if we take f(x, y) to be x or y, respectively, we obtain 


cay = | [ was, dy), exy= | [ yu’ (dx, dy) 
Re Rr 


and consequently 
(17) E(X + Y)= &(X) 4+ &(Y). 


This result is a case of the linearity of & given but not proved here; the proof 
above reduces this property in the general case of (Q, 4%, #) to the corre- 
sponding one in the special case (2”, ZB, wu”). Such a reduction is frequently 
useful when there are technical difficulties in the abstract treatment. 

We end this section with a discussion of “moments”. 

Let a be real, r positive, then &(|X — a|") is called the absolute moment 
of X of order r, about a. It may be +00; otherwise, and if r is an integer, 
é((X — a)’) is the corresponding moment. If yz and F are, respectively, the 
p.m. and d.f. of X, then we have by Theorem 3.2.2: 


£(\X — al") = [ eal’ u(dx) = / x — al’ dF(x), 


&(X —ay')= I (x — a)" u(dx) = / (x — a)’ dF (x). 


For r= 1, a= 0, this reduces to é(X), which is also called the mean of X. 
The moments about the mean are called central moments. That of order 2 is 
particularly important and is called the variance, var (X); its positive square 
root the standard deviation. o(X): 


var (X) = 02(X) = &{(K — &(X)} = &(X*) — {E(X)}’. 


We note the inequality o?(X) < &(X?), which will be used a good deal 
in Chapter 5. For any positive number p, X is said to belong to L? = 
L?(Q2, F, DP) iff (|X|?) < o. 
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The well-known inequalities of Hélder and Minkowski (see, e.g., 
Natanson [3]) may be written as follows. Let X and Y be r.v.’s, 1 < p < 
and 1/p+1/q = 1, then 


(18) |E(XY)| < EUXY|) < EUX/P)/P ECY I), 
(19) {EX + YIP)? < EXP)? + EYP). 
If Y = 1 in (18), we obtain 

(20) E (|X|) < &(X|?)'/?; 


for p = 2, (18) is called the Cauchy—Schwarz inequality. Replacing |X| by 
|X|", where 0 < r < p, and writing r’ = pr in (20) we obtain 


(21) EQ(X YY < UX)", O<r<r' <0. 


The last will be referred to as the Liapounov inequality. It is a special case of 
the next inequality, of which we will sketch a proof. 


Jensen’s inequality. If y is a convex function on &!, and X and g(X) are 
integrable r.v.’s, then 


(22) g(E(X)) < &((X)). 
PROOF. Convexity means : for every positive A;,...,4, with sum 1 we 
have 
n n 
(23) g| > Aj; | < 32 Aj90)). 
j=l j=l 


This is known to imply the continuity of yg, so that g(X) is an r.v. We shall 
prove (22) for a simple r.v. and refer the general case to Theorem 9.1.4. Let 
then X take the value y; with probability A;, 1 < 7 <n. Then we have by 
definition (1): 


ED = Aji EH) = Y72;90)). 
j=! 


j=l 
Thus (22) follows from (23). 
Finally, we prove a famous inequality that is almost trivial but very 


useful. 


Chebyshev inequality. If y is a strictly positive and increasing function 
on (0, oo), g(u) = g(—u), and X is an r.v. such that ¢{g(X)} < oo, then for 
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each u > 0: 
E{O(X)} 
gu) 


PROOF. We have by the mean value theorem: 


A{|X| 2 u} < 


é(@(X)) = ; o(X)dP > [ 90d? = OAALK| 2 a 


X|>u 
from which the inequality follows. 


The most familiar application is when g(u) = |u|? for 0 < p < oo, so 
that the inequality yields an upper bound for the “tail” probability in terms of 
an absolute moment. 


EXERCISES 


9. Another proof of (14): verify it first for simple r.v.’s and then use 
Exercise 7 of Sec. 3.2. 

10. Prove that if 0 <r</r' and &(|X|") < 00, then &(|X|") < 00. Also 
that &(|X|") < 00 if and only if &(|X — al’) < oo for every a. 

*11. If &(X2) =1 and &(|X}) >a> 0, then A{|X| > Aa} = (1 — A)? a? 
forO <A <1. 

*12. If X>0 and Y>0, p>O0, then &{(X4+ Y)?} < 2°{E(X?) + 
&(Y")}. If p > 1, the factor 2? may be replaced by 2?! If O< p< 1, it 
may be replaced by 1. 

*13. If X; > 0, then 


P 


n n 
& Sox; < or > > &(XF) 
j=l j=l 


according as p< lor p> 1. 
*14. If p > 1, we have 


P 

Tisees jee 

i Xj <-)5 [Xl 

n 

j=l j=l 
and so 

1 n : 1 n 
Pe l—S Xj <-)5 EX jl?)5 
é : j = é(] a) 
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we have also 
Pp 


P 
12 i ; 
C ; a @ . / 
¢ a S Xj < nl > & (|X ;|P) P 


j=l 


Compare the inequalities. 
15. If p>0, &(|X|?) < co, then x?A{|X| >x}=o(1) as x > —w. 
Conversely, if x? 7 {|X| > x} = o(1), then &(|X|?~*) < oo for 0 <€ < p. 
*16. For any d.f. and any a > 0, we have 


ie. ¢) 
/ [F(x + a) — F(x)] dx =a. 
—0O 
17. If F isa df. such that F(O—) = 0, then 
C OO 
| {1 — F(@)} dx = / xdF(x) < +00. 
0 0 
Thus if X is a positive r.v., then we have 
oO lo.¢) 
Eé(X) -/ P(X > dx = | P{X > x} dx. 
0 0 
18. Prove that [°>. |x|dF(x) < oo if and only if 


0 foe) 
/ F(x)dx < oo and [ [1 — F(x)] dx < ow. 
0 


—3O 


*19. If {X,,} is a sequence of identically distributed r.v.’s with finite mean, 
then 


eo 
lim — é{ max |X ;|} = 0. 
non l<j<n 


[HinT: Use Exercise 17 to express the mean of the maximum.] 
20. For r > 1, we have 


r 


r—] 


oan 
[ —é(X Au’) du= E(x"), 
0 ul 
[HINT: By Exercise 17, 


E(X Au’) -| P(X > x)dx -| PX s y)ru' do, 
0 0 


substitute and invert the order of the repeated integrations.] 
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3.3 Independence 


We shall now introduce a fundamental new concept peculiar to the theory of 
probability, that of “(stochastic) independence”. 


DEFINITION OF INDEPENDENCE. The r.v.’s {X;, 1 < j <n} are said to be 
(totally) independent iff for any linear Borel sets {B;, 1 < j < n} we have 


(1) P > ({ \(X; € By) > = | [ PX; € B)). 

The r.v.’s of an infinite family are said to be independent iff those in every 
finite subfamily are. They are said to be pairwise independent iff every two 
of them are independent. 


Note that (1) implies that the r.v.’s in every subset of {X;,1 <j <n} 
are also independent, since we may take some of the B;’s as #'. On the other 
hand, (1) is implied by the apparently weaker hypothesis: for every set of real 
numbers {x;,1 < j <n}: 


(2) Ps (\(%j <x)? = [| AR; =x). 
jel j=l 


The proof of the equivalence of (1) and (2) is left as an exercise. In terms 
of the p.m. jz” induced by the random vector (X),...,X,) on (2", A"), and 
the p.m.’s {u;, 1 < j <n} induced by each X; on (#!, A), the relation (1) 
may be written as 


(3) we”) X By} =] fei). 


where 5-18; is the product set B; x --- x B, discussed in Sec. 3.1. Finally, 
we may introduce the n-dimensional distribution function corresponding to 
2", which is defined by the left side of (2) or in alternative notation: 


Fis a aA\X <a, be janjeu XK (-0, x;]) 
j=l 


then (2) may be written as 


BOy can) |G 
j=l 
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From now on when the probability space is fixed, a set in 4 will also be 
called an event. The events {E;, 1 < j <n} are said to be independent iff their 
indicators are independent; this is equivalent to: for any subset {j1,...j¢} of 
{1,...,n}, we have 


£ £ 
(4) Men = [[7Ej,). 


k=] 


Theorem 3.3.1. If {X;, 1 < j <n} are independent r.v.’s and {f;,1<j< 
n} are Bore] measurable functions, then {f ;(X;), 1 < j <n} are independent 
r.v.’S. 


proor. Let A; € Z}, then f7'(Aj) € Z! by the definition of a Borel 
measurable function. By Theorem 3.1.1, we have 


n 


(VFX) € Ay} = (IX; € FFD}. 


j=l j=l 
Hence we have 


PY (VF EAN? =P 9 [VEX € FFA p = [TPG € FAD 


j=1 j=l j=l 
= [[ Pf j(%)) € Aj}. 
jJ=1 


This being true for every choice of the A,’s, the f ;(X ;)’s are independent 
by definition. 


The proof of the next theorem is similar and is left as an exercise. 


Theorem 3.3.2. Let 1 <n, <n). <-+--<ny,y=n;f; a Borel measurable 
function of n; variables, f2 one of ny — n, variables,..., f, one of nz — ny} 
variables. If {X;, 1 < j <n} are independent r.v.’s then the k r.v.’s 


Fi, eae $ pened F (Xn, 41; ‘ wee i Ayal Ri Mag i265 cet ip) 


are independent. 


Theorem 3.3.3. If X and Y are independent and both have finite expecta- 
tions, then 


(5) E(XY) = &(X)E(Y). 
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PROOF. We give two proofs in detail of this important result to illustrate 
the methods. Cf. the two proofs of (14) in Sec. 3.2, one indicated there, and 
one in Exercise 9 of Sec. 3.2. 


First proof. Suppose first that the two r.v.’s X and Y are both discrete 
belonging respectively to the weighted partitions {A,;c;} and {M,;d,} such 
that Aj; = {X =c;}, M; = {Y = d,}. Thus 


é(X)= S> cj; P(A;), &(Y)= So dP (Mk). 
j k 


Now we have 


Q= UU Aj a (U mi) = | J(AjMe) 
j k ik 
and 
X(w)Y(w) =cjdy if we AjMx. 


Hence the r.v. XY is discrete and belongs to the superposed partitions 
{A ;Mx; cjd,} with both the j and the k varying independently of each other. 
Since X and Y are independent, we have for every j and k: 


P(A My) = P(X = C73 ¥ = dy) = PX =Cj/)PY = dy) = P(AAMK); 


and consequently by definition (1) of Sec. 3.2: 


E(XY) = S > cide PAM) = 4 Yc POAA) PS DPM) 
ek J k 


= &(X)&(Y). 


Thus (5) is true in this case. 

Now let X and Y be arbitrary positive r.v.’s with finite expectations. Then, 
according to the discussion at the beginning of Sec. 3.2, there are discrete 
r.v.’s X,, and Y,, such that ¢(X,,) ¢ &(X) and €(Y,,) t &(Y). Furthermore, 
for each m, X, and Y,, are independent. Note that for the independence of 
discrete r.v.’s it is sufficient to verify the relation (1) when the B,’s are their 
possible values and “ec” is replaced by “=” (why ?). Here we have 


_, n n' (on n+l n’ n'+] 
2 {Xn = sail = ab =P LE <x < am 5m SY < J \ 
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=? {kn= pO yim = meh 


The independence of X,, and Y,, is also a consequence of Theorem 3.3.1, 
since Xj, = [2'X]/2”, where [X] denotes the greatest integer in X. Finally, it 
is clear that X,,Y,, is increasing with m and 


O< XY — Xn V¥m =XY —¥n) + ¥mn(X —Xm) > 0. 
Hence, by the monotone convergence theorem, we conclude that 
E(XY) = lim €(Xm¥m) = lim &(Xn)E(Ym) 
mC m—> oO 
= lim é&(X,,) lim &€(¥m) = E(X)E(Y). 
m—> oo m—> OO - 
Thus (5) is true also in this case. For the general case, we use (2) and (3) of 
Sec. 3.2 and observe that the independence of X and Y implies that of X* and 


Y+; X~ and Y~; and so on. This again can be seen directly or as a consequence 
of Theorem 3.3.1. Hence we have, under our finiteness hypothesis: 


E(XY) = E(X* —X-y(yt —Y-)) 

= &(XtTY* —XtY" —X-Y¥t4xX¥-) 

= &(XTYT) — E(XTY-) — &(X-Y*) + XY) 
EXEL) — EXE) — EX )ELT) + EX YE) 
= {E(XT) — EX )HEW*) — EY-)} = EXE). 


II 


The first proof is completed. 


Second proof. Consider the random vector (X, Y) and let the p.m. 
induced by it be .*(dx, dy). Then we have by Theorem 3.2.3: 


cay)= | xvar= ff xy ax, dy) 
Q 


R? 


By (3), the last integral is equal to 


/ i} UCPC i CS / y pa(dx) = E)E(Y), 
Bm Sp Ri Rl 


finishing the proof! Observe that we are using here a very simple form of 
Fubini’s theorem (see below). Indeed, the second proof appears to be so much 
shorter only because we are relying on the theory of “product measure” px* = 
[Ly X fly on (#*, A’). This is another illustration of the method of reduction 
mentioned in connection with the proof of (17) in Sec. 3.2. 
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Corollary. If {X;, 1 < j <n} are independent r.v.’s with finite expectations, 
then 


n 


(6) é| [[ x; | =], exp. 
j=l 


j=l 


This follows at once by induction from (5), provided we observe that the 


{Wo F.v.’S 
n 
[x and II Xj 
j=l jok+i 


are independent for each k, 1 < k <n — 1. A rigorous proof of this fact may 
be supplied by Theorem 3.3.2. 


Do independent random variables exist? Here we can take the cue from 
the intuitive background of probability theory which not only has given rise 
historically to this branch of mathematical discipline, but remains a source of 
inspiration, inculcating a way of thinking peculiar to the discipline. It may 
be said that no one could have learned the subject properly without acquiring 
some feeling for the intuitive content of the concept of stochastic indepen- 
dence, and through it, certain degrees of dependence. Briefly then: events are 
determined by the outcomes of random trials. If an unbiased coin is tossed 
and the two possible outcomes are recorded as 0 and 1, this is an r.v., and it 
takes these two values with roughly the probabilities ; each. Repeated tossing 
will produce a sequence of outcomes. If now a die is cast, the outcome may 
be similarly represented by an r.v. taking the six values 1 to 6; again this 
may be repeated to produce a sequence. Next we may draw a card from a 
pack or a ball from an urn, or take a measurement of a physical quantity 
sampled from a given population, or make an observation of some fortuitous 
natural phenomenon, the outcomes in the last two cases being r.v.’s taking 
some rational values in terms of certain units; and so on. Now it is very 
easy to conceive of undertaking these various trials under conditions such 
that their respective outcomes do not appreciably affect each other; indeed it 
would take more imagination to conceive the opposite! In this circumstance, 
idealized, the trials are carried out “independently of one another” and the 
corresponding r.v.’s are “independent” according to definition. We have thus 
“constructed” sets of independent r.v.’s in varied contexts and with various 
distributions (although they are always discrete on realistic grounds), and the 
whole process can be continued indefinitely. 

Can such a construction be made rigorous? We begin by an easy special 
case. 
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Example 1. Let n > 2 and (22;,.4, Y;) be n discrete probability spaces. We define 
the product space 
Q”" = Q, x --- x Q,(n factors) 


to be the space of all ordered n-tuples m” = (a, ..., @,), Where each w; € 2;. The 
product B.F, /” is simply the collection of all subsets of Q”, just as / is that for Q;. 
Recall that (Example 1 of Sec. 2.2) the p.m. Y; is determined by its value for each 
point of Q;. Since 2” is also a countable set, we may define a p.m. 7” on /" by the 
following assignment: 


(7) P* ({0"}) = | [ Flo;}), 
j=l 
namely, to the n-tuple (w),..., @,) the probability to be assigned is the product of the 


probabilities originally assigned to each component w; by Y;. This p.m. will be called 
the product measure derived from the p.m.’s {7;, 1 < j < n} and denoted by X'_,F. 
It is trivial to verify that this is indeed a p.m. Furthermore, it has the following product 
property, extending its definition (7): if S$; € 4,1 < j <n, then 


(8) Pl XK S;| =[[ AG). 
jel 


To see this, we observe that the left side is, by definition, equal to 


Yo Alon cab= So -- YS TT Ade) 


w, ES] wn €ESn @) ES @n€Sn j=l 
n n 
=[[4 So Ade)? =[[ AS). 
j=i wjES; j=l 


the second equation being a matter of simple algebra. 
Now let X; be an r.v. (namely an arbitrary function) on 22;; B; be an arbitrary 
Borel set; and $; = X>'(B;), namely: 


Sj = {w; € Qj: X ;(@;) € Bj} 
so that S; € 4, We have then by (8): 


0) Ps KK eBlp =P") XK $;} =[[ AGS) = [[AtX; € By. 
j=l j=l j=l 


j=l 


To each function X; on (2; let correspond the function X j on £2" defined below, in 
which w = (@),...,@,) and eacn “coordinate” w,; is regarded as a function of the 
point w: 

Vo € 2": X j(w) = X;(w;). 
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Then we have 


n n 


(ia: X (wv) € By} = XK {w;:Xj(@;) € B;} 


j=l j=l 
since 
{w: X j(w) € Bj} = Qy x +» x Qy-4 x {WX j(@;) € By} xX Qju X ++ X Qe 


It follows from (9) that 


f* (\iX; € Bi = [[ 71x; € Bi). 


j=l j=l 
Therefore the r.v.’s {X j, 1 < j <n} are independent. 


Example 2. Let 2” be the n-dimensional cube (immaterial whether it is closed or 
not): 
U" = {(%1,...,%,): OS x; <1j;1 <j <n}. 


The trace on @" of (R", B",m"), where A” is the n-dimensional Euclidean space, 
#" and m” the usual Borel field and measure, is a probability space. The p.m. m” on 
HW" js a product measure having the property analogous to (8). Let {f;,1 <j <n} 
be n Borel measurable functions of one variable, and 


X j((%, + sg Xe) _ Sf j(x;)- 


Then {X,;, 1 < j <n} are independent r.v.’s. In particular if f ;(x;) =x,, we obtain 
the n coordinate variables in the cube. The reader may recall the term “independent 
variables” used in calculus, particularly for integration in several variables. The two 
usages have some accidental rapport. 


Example 3. The point of Example 2 is that there is a ready-made product measure 
there. Now on (A&", #") it is possible to construct such a one based on given p.m.’s 
on (&!, A'). Let these be {u;, 1 < j <n}; we define yz” for product sets, in analogy 
with (8), as follows: 


we’) X B | =[[ 4). 
] 


i= j= 


It remains to extend this definition to all of 4", or, more logically speaking, to prove 
that there exists a p.m. wz” on 4%" that has the above “product property”. The situation 
is somewhat more complicated than in Example 1, just as Example 3 in Sec. 2.2 is 
more complicated than Example 1 there. Indeed, the required construction is exactly 
that of the corresponding Lebesgue—Stieltjes measure in n dimensions. This will be 
subsumed in the next theorem. Assuming that it has been accomplished, then sets of 
n independent r.v.’s can be defined just as in Example 2. 
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Example 4. Can we construct r.v.’s on the probability space (7/, #, m) itself, without 
going to a product space? Indeed we can, but only by imbedding a product structure 
in 7. The simplest case will now be described and we shall return to it in the next 
chapter. 

For each real number in (0,1], consider its binary digital expansion 


(10) a eache, =0 or 1. 


This expansion is unique except when x is of the form m/2”; the set of such x is 
countable and so of probability zero, hence whatever we decide to do with them will 
be immaterial for our purposes. For the sake of definiteness, let us agree that only 
expansions with infinitely many digits “1” are used. Now each digit €; of x is a 
function of x taking the values 0 and | on two Borel sets. Hence they are r.v.’s. Let 
{c;, j = 1} be a given sequence of 0’s and 1’s. Then the set 


{x:ea)=c,1l<j<n= Pies €;(x) = cj} 
j=l 


is the set of numbers x whose first n digits are the given c,’s, thus 
X= O02 5+ Cn €ngi€n42 67° 


with the digits from the (7 + 1)st on completely arbitrary. It is clear that this set is 
just an interval of length 1/2”, hence of probability 1/2”. On the other hand for each 


j, the set {x: €;(x) = c;} has probability 5 for a similar reason. We have therefore 


1 n 1 n 
Pile; =cj,1 sism=5-]1(5) = |] Ae; =cl}- 
j=l j=l 


This being true for every choice of the c;’s, the r.v.’s {€;, j => 1} are independent. Let 
{ fj, J = 1} be arbitrary functions with domain the two points {0, I}, then {f ;(€;), 7 = 
1} are also independent r.v.’s. 

This example seems extremely special, but easy extensions are at hand (see 
Exercises 13, 14, and 15 below). 


We are now ready to state and prove the fundamental existence theorem 
of product measures. 


Theorem 3.3.4. Let a finite or infinite sequence of p.m.’s {w;} on (Z’, B'), 
or equivalently their df.’s, be given. There exists a probability space 
(Q, 4%, P) and a sequence of independent r.v.’s {X;} defined on it such that 
for each j, 4; is the p.m. of Xj. 


proor. Without loss of generality we may suppose that the given 
sequence is infinite. (Why?) For each n, let (Q,, 4%, 4) be a probability space 
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in which there exists an r.v. X,, with jz, as its p.m. Indeed this is possible if we 
take (Q,, FH, n) to be (#', Z!, w,,) and X,, to be the identical function of 
the sample point x in R', now to be written as w, (cf. Exercise 3 of Sec. 3.1). 


Now define the infinite product space 


oO 
Q= K Qr 


r= 


oy 


on the collection of all “points” w = {@1, @2,...,@n,-..}, where for each n, 
@, is a point of Q,. A subset E of Q will be called a “finite-product set” iff 
it is of the form 


(11) E= XX Fr. 


=] 


= 


where each F,, € 4, and all but a finite number of these F,,,’s are equal to the 
corresponding Q,,’s. Thus w € E if and only if w, € F,, n > 1, but this is 
actually a restriction only for a finite number of values of n. Let the collection 
of subsets of Q, each of which is the union of a finite number of disjoint finite- 
product sets, be #. It is easy to see that the collection “& is closed with respect 
to complementation and pairwise intersection, hence it is a field. We shall take 
the # in the theorem to be the B.F. generated by %. This ¥ is called the 
product B.F. of the sequence {%,,n > 1} and denoted by xn 

We define a set function 7 on & as follows. First, for each finite-product 
set such as the E given in (11) we set 


(12) PE) = |[ An). 
n=] 


where all but a finite number of the factors on the right side are equal to one. 
Next, if E € “@ and 
n 
E=|JE®, 
k=l 


where the E™’s are disjoint finite-product sets, we put 
n 
(13) PE) = J) PE). 
k=1 
If a given set E in has two representations of the form above, then it is 


not difficult to see though it is tedious to explain (try drawing some pictures!) 
that the two definitions of /(E) agree. Hence the set function /” is uniquely 
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defined on -,; it is clearly positive with 7(Q) = 1, and it is finitely additive 
on “ by definition. In order to verify countable additivity it is sufficient to 
verify the axiom of continuity, by the remark after Theorem 2.2.1. Proceeding 
by contraposition, suppose that there exist a 5 > 0 and a sequence of sets 
{C,,n => 1} in # such that for each n, we have C, D C,4; and A(C,) > 
5 > 0; we shall prove that (\~P_, C, 4 @. Note that each set E in %, as well 
as each finite-product set, is “determined” by a finite number of coordinates 
in the sense that there exists a finite integer k, depending only on E, such that 
if @ = (@), @2,...) and w’ = (@}, @,...) are two points of Q with w; = a, 
for 1 < j <k, then either both w and w’ belong to E or neither of them does. 
To simplify the notation and with no real loss of generality (why?) we may 
suppose that for each n, the set C,, is determined by the first n coordinates. 
Given wo, for any subset E of Q2 let us write (E | w?) for 2; x E;, where E£; is 
the set of points (w2, @3,...) in X72,Q, such that (w?, w2, @3,...) € E. If w? 
does not appear as first coordinate of any point in E, then (E | w?) = @. Note 
that if E € “, then (E | w?) € A for each wo. We claim that there exists an 
w? such that for every n, we have P((C,, | w?)) > 6/2. To see this we begin 
with the equation 


(14) PCy) = I P(Cpq | or))Pi (de). 


This is trivial if C, is a finite-product set by definition (12), and follows by 
addition for any C,, in “@. Now put B, = {@):A(Cp | @)) > 5/2}, a subset 
of (2,; then it follows that 


ssf adn + | Sao 

B, Be 2 
and so .7(B,) > 6/2 for every n > 1. Since B,, is decreasing with C,,, we have 
A ((\p, Bn) = 5/2. Choose any w? in (Po, By. Then AC(C,, | wv)) > 6/2. 
Repeating the argument for the set (C,, | w°), we see that there exists an ws 
such that for every n, ?((C), | wr, w$)) > 5/4, where (C,, | wo, w$) = (C,, | 
w°) | w$) is of the form Q, x Q) x E3 and E3 is the set (w3,@4,...) in 
x Qh such that (?, ws, 3, W4,...) € Cy; and so forth by induction. Thus 
for each k > 1, there exists wy such that 


7 é 
Vn: P(Cy | ot,..., @%)) = 5k 


Consider the point w® = (w), v9, ..., @®,...). Since (Cy | w°,..., w2) 4g, 
there is a point in C, whose first k coordinates are the same as those of w”; 
since C, is determined by the first k coordinates, it follows that w°? € C,. This 


is true for all k, hence w® € (2, Cx, as was to be shown. 
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We have thus proved that 7 as defined on & is a p.m. The next theorem, 
which is a generalization of the extension theorem discussed in Theorem 2.2.2 
and whose proof can be found in Halmos [4], ensures that it can be extended 
to F as desired. This extension is called the product measure of the sequence 
{P,,n > 1} and denoted by xO Pr with domain the product field < neta 


Theorem 3.3.5. Let & be a field of subsets of an abstract space 92, and 7 
a p.m. on :%. There exists a unique p.m. on the B.F. generated by % that 
agrees with Y on #. 


The uniqueness is proved in Theorem 2.2.3. 

Returning to Theorem 3.3.4, it remains to show that the r.v.’s {w,} are 
independent. For each k > 2, the independence of {wj,1 <j < k} is a conse- 
quence of the definition in (12); this being so for every k, the independence 
of the infinite sequence follows by definition. 

We now give a statement of Fubini’s theorem for product measures, 
which has already been used above in special cases. It is sufficient to consider 
the case n =2. Let 2R=Q)xK 2, F=AxF and P=A7A x F, be the 
product space, B.F., and measure respectively. 

Let (4, A), (A,A) and (A x4,AxA) be the completions 
of (A,A), (A.A), and (A xA,A xP), respectively, according to 
Theorem 2.2.5. 


Fubini’s theorem. Suppose that f is measurable with respect to A xA 
and integrable with respect to A, x A. Then 


(i) for each w, € 2;\N; where Nj ¢ A and Y(Ni) = 0, f(r, -) is 
measurable with respect to “% and integrable with respect to A; 
(ii) the integral 


- fC, @2)A(dar) 


is measurable with respect to A, and integrable with respect to Py; 
(iii) The following equation between a double and a repeated integral 
holds: 


as) | [ for.oyAx Faw) = | | J (01, oVFldon)| F(den) 
Q) x Qa a a 
Furthermore, suppose f is positive and measurable with respect to 


A, x Ay; then if either member of (15) exists finite or infinite, so does the 
other also and the equation holds. 
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Finally if we remove all the completion signs “—” in the hypotheses 
as well as the conclusions above, the resulting statement is true with the 
exceptional set N; in (i) empty, provided the word “integrable” in (i) be 
replaced by “has an integral.” 

The theorem remains true if the p.m.’s are replaced by o-finite measures; 
see, e.g., Royden [5]. 

The reader should readily recognize the particular cases of the theorem 
with one or both of the factor measures discrete, so that the corresponding 
integral reduces to a sum. Thus it includes the familiar theorem on evaluating 
a double series by repeated summation. 

We close this section by stating a generalization of Theorem 3.3.4 to the 
case where the finite-dimensional joint distributions of the sequence {X ;} are 
arbitrarily given, subject only to mutual consistency. This result is a particular 
case of Kolmogorov’s extension theorem, which is valid for an arbitrary family 
of r.v.’s. 

Let m and n be integers: 1 < m < n, and define 77, to be the “projection 
map” of 4” onto B” given by 


VB € B™": Minn (B) = {(%1, ..-, Xn): (1, ---s Xm) € BY}. 


Theorem 3.3.6. For each n > 1, let uw” be a p.m. on (#", Z") such that 


(16) Vm <n: [L" © Tim = LW”. 
Then there exists a probability space (Q, 4, ) and a sequence of r.v.’s {X ;} 
on it such that for each n, w” is the n-dimensional p.m. of the vector 


(X1,...,Xn). 


Indeed, the Q and ¥ may be taken exactly as in Theorem 3.3.4 to be the 
product space 


2 and X F 
J 


j 


where (Qj, A) = (*#', ') for each j; only 7 is now more general. In terms 
of d.f.’s, the consistency condition (16) may be stated as follows. For each 
m > 1 and (x,...,%m) € 2”, we have if n > m: 


lim, Fim, s+ yXms Xm41> 1625 Xn) = Fin (x1, 125 Xm): 
Amt 


For a proof of this first fundamental theorem in the theory of stochastic 
processes, see Kolmogorov [8]. 
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EXERCISES 


1. Show that two r.v.’s on (Q, #) may be independent according to one 
p.m. / but not according to another! 

*2. If X; and X> are independent r.v.’s each assuming the values +1 
and —1 with probability 5 then the three r.v.’s {X1, X2, X,X2} are pairwise 
independent but not totally independent. Find an example of n r.v.’s such that 
every n — 1 of them are independent but not all of them. Find an example 
where the events A; and Az are independent, A, and A3 are independent, but 
A, and A2 U A3 are not independent. 

*3. If the events {E,,a@ A} are independent, then so are the events 
{Fy,a@ € A}, where each Fy may be E, or E%; also if {Ag, 8 € B}, where 
B is an arbitrary index set, is a collection of disjoint countable subsets of A, 
then the events 

LJ Ze. BEB, 


acAg 


are independent. 

4. Fields or B.F.’s A(C F) of any family are said to be independent iff 
any collection of events, one from each A, forms a set of independent events. 
Let %, be a field generating %. Prove that if the fields Z are independent, 
then so are the B.F.’s %. Is the same conclusion true if the fields are replaced 
by arbitrary generating sets? Prove, however, that the conditions (1) and (2) 
are equivalent. [HInr: Use Theorem 2.1.2.] 

5. If {Xq} is a family of independent r.v.’s, then the B.F.’s generated 
by disjoint subfamilies are independent. [Theorem 3.3.2 is a corollary to this 
proposition. ] 

6. The r.v. X is independent of itself if and only if it is constant with 
probability one. Can X and f(X) be independent where f € #!? 

7. If {E;,1 < j < o&} are independent events then 


[oe] 


P a = |[7%&)). 
j=l 


j=l 
where the infinite product is defined to be the obvious limit; similarly 
[o.@) oe) 
P| (Je; | =1-[[a-7e;)). 
j=l j=l 


8. Let {X;, 1 < j <n} be independent with df.’s {F;,1 <j < n}. Find 
the d.f. of max; X; and min; X;. 
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*9. If X and Y are independent and &(X) exists, then for any Borel set 
B, we have 


i XdP = E(X)PY €B). 
{YeB} 


*10. If X and ¥ are independent and for some p > 0: é(|X + Y|?) < 00, 
then &(|X|?) < 00 and &(|Y|?) < o0. 

11. If X and Y are independent, &(|X|”) < oo for some p> 1, and 
&(Y) =0, then (|X + ¥|?) > &(|X|?). [This is a case of Theorem 9.3.2; 
but try a direct proof!] 

12. The r.v.’s {€;} in Example 4 are related to the “Rademacher func- 
tions”: 

r;(x) = sgn(sin 2/7x). 


What is the precise relation? 
13. Generalize Example 4 by considering any s-ary expansion where s > 
2 is an integer: 
En 
ee where €, =0,1,...,5—1. 

*14. Modify Example 4 so that to each x in [0, 1] there corresponds a 
sequence of independent and identically distributed r.v.’s {€,,n > 1}, each 
taking the values 1 and 0 with probabilities p and 1 — p, respectively, where 
O < p < 1. Such a sequence will be referred to as a “coin-tossing game (with 
probability p for heads)”; when p = 5 it is said to be “fair”. 

15. Modify Example 4 further so as to allow e, to take the values 1 and 
0 with probabilities p, and 1 — p,, where 0 < p, < 1, but p, depends on n. 
16. Generalize (14) to 


Peery 


es [ PEN ois coca SOON. 


where C(1,...,k) is the set of (@),...,@,) which appears as the first k 
coordinates of at least one point in C and in the second equation @),..., wx 
are regarded as functions of w. 


*17. For arbitrary events {E;,1< j <n}, we have 


P\UE | => %E)- SS PEE,). 
j=l j=l 


I<j<k<n 
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If Wn: {E"” , 1 < j <n} are independent events, and 


n 
Pp Jey >0 a now, 
jel 


then 


P jen ~ DPE", 
j=l 


j=l 


18. Prove that Bx BA Bx ZB, where ZF is the completion of B with 
respect to the Lebesgue measure; similarly 4 x &. 


19. If fe Ax and 


/ fla XD) < 00, 


22) x Q2 


[|p rem em= fl | f 12%) om 


20. A typical application of Fubini’s theorem is as follows. If f is a 
Lebesgue measurable function of (x, y) such that f(x, y) = 0 for each x € #! 
and y ¢ N,, where m(N,) = 0 for each x, then we have also f(x, y) = 0 for 
each y¢ N andxeé N,, where m(N ) = 0 and m(N,) = 0 foreach yéN. 


4 Convergence concepts 


4.1 Various modes of convergence 


As numerical-valued functions, the convergence of a sequence of r.v.’s 
{Xn,n = 1}, to be denoted simply by {X,,} below, is a well-defined concept. 
Here and hereafter the term “convergence” will be used to mean convergence 
to a finite limit. Thus it makes sense to say: for every w € A, where 
A €@, the sequence {X,,(w)} converges. The limit is then a finite-valued 
r.v. (see Theorem 3.1.6), say X(w), defined on A. If Q = A, then we have 
“convergence every-where”, but a more useful concept is the following one. 


DEFINITION OF CONVERGENCE “ALMOST EVERYWHERE” (a.e.). The sequence of 
r.v. {X,,} is said to converge almost everywhere [to the r.v. X] iff there exists 
a null set N such that 


(1) Vo € Q\N: lim X,(w) = X(@) finite. 
NOOO 


Recall that our convention stated at the beginning of Sec. 3.1 allows 
each r.v. a null set on which it may be too. The union of all these sets being 
still a null set, it may be included in the set N in (1) without modifying the 
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conclusion. This type of trivial consideration makes it possible, when dealing 
with a countable set of r.v.’s that are finite a.e., to regard them as finite 
everywhere. 

The reader should learn at an early stage to reason with a single sample 
point wo and the corresponding sample sequence {X,,(wo), n = 1} as a numer- 
ical sequence, and translate the inferences on the latter into probability state- 
ments about sets of w. The following characterization of convergence a.e. is 
a good illustration of this method. 


Theorem 4.1.1. The sequence {X,,} converges a.e. to X if and only if for 
every € > 0 we have 


(2) lim Y{|X, —X| <€ for alln > m} =1; 
mo 
or equivalently 
(2’) lim P{|X, —X| > € for some n > m} = 0. 
moO 


PROOF. Suppose there is convergence a.e. and let 29 = &2\N where N is 
as in (1). For m > 1 let us denote by A,,(€) the event exhibited in (2), namely: 


(3) Am(€) = [} {Xn — X| < 4). 


A=m 


Then A,,(€) is increasing with m. For each wo, the convergence of {X;,(@0)} 
to X(wo) implies that given any € > 0, there exists m(wo, €) such that 


(4) n > m(wo, €) = |Xn(wo) — X(@o)| S €. 


Hence each such wo belongs to some A,(€) and so 929 C Ur, Am(€). 
It follows from the monotone convergence property of a measure that 
limp—>oo P(Am(€)) = 1, which is equivalent to (2). 

Conversely, suppose (2) holds, then we see above that the set A(e) = 
Ure; Am(€) has probability equal to one. For any wo € A(e€), (4) is true for 
the given ¢. Let € run through a sequence of values decreasing to zero, for 
instance {1/n}. Then the set 


Atl 
A={ |A{— 
MG) 
still has probability one since 


P(A) = linn? (A (<)). 
n n 
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If wo belongs to A, then (4) is true for all « = 1/n, hence for all « > 0 (why?). 
This means {X,,(wo)} converges to X(w) for all wo in a set of probability one. 


A weaker concept of convergence is of basic importance in probability 
theory. 


DEFINITION OF CONVERGENCE “IN PROBABILITY” (in pr.). The sequence {X,,} 
is said to converge in probability to X iff for every « > 0 we have 


(5) lim P{|Xn —X|>e}=0. 


Strictly speaking, the definition applies when all X,, and X are finite- 
valued. But we may extend it to r.v.’s that are finite a.e. either by agreeing 
to ignore a null set or by the logical convention that a formula must first be 
defined in order to be valid or invalid. Thus, for example, if X,(@) = +00 
and X(w) = +o0 for some w, then X,,(w) — X(w) is not defined and therefore 
such an w cannot belong to the set {|X, — X| > €} figuring in (5). 

Since (2’) clearly implies (5), we have the immediate consequence below. 


Theorem 4.1.2. Convergence a.e. [to X] implies convergence in pr. [to X]. 


Sometimes we have to deal with questions of convergence when no limit 
is in evidence. For convergence a.e. this is immediately reducible to the numer- 
ical case where the Cauchy criterion is applicable. Specifically, {X,,} converges 
a.e. iff there exists a null set N such that for every w € Q\N and every € > 0, 
there exists m(w, €) such that 


n’>n>m(o,€) > |X, (@) —Xy(o)| < €. 


The following analogue of Theorem 4.1.1 is left to the reader. 


Theorem 4.1.3. The sequence {X,,} converges a.e. if and only if for every € 
we have 


(6) lim A{|X, —X,| > € for some n’ > n > m} =0. 
mo 


For convergence in pr., the obvious analogue of (6) is 
(7) lim P{|X_ —Xn'| > €} = 0. 


It can be shown (Exercise 6 of Sec. 4.2) that this implies the existence of a 
finite r.v. X such that X, — X in pr. 
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DEFINITION OF CONVERGENCE IN L?,0 < p < 00. The sequence {X,,} is said 
to converge in L? to X iff X, € L?, X € L? and 


(8) lim &(|X, —X|?) =0. 
n>>0O 


In all these definitions above, X, converges to X if and only if X,, —X 
converges to 0. Hence there is no loss of generality if we put X = 0 in the 
discussion, provided that any hypothesis involved can be similarly reduced 
to this case. We say that X is dominated by Y if |X| < Y ae., and that the 
sequence {X,,} is dominated by Y iff this is true for each X, with the same 
Y. We say that X or {X,,} is uniformly bounded iff the Y above may be taken 
to be a constant. 


Theorem 4.1.4. If X, converges to 0 in L?, then it converges to 0 in pr. 
The converse is true provided that {X,,} is dominated by some Y that belongs 
to L?. 


Remark. If X, — X in L?, and {X,} is dominated by Y, then {X, — X} 
is dominated by Y + |X|, which is in L?. Hence there is no loss of generality 
to assume X = 0. 

PROOF. By Chebyshev inequality with g(x) = |x|?, we have 

e Xp P 
0) A\Ky| >) < A 
€ 


Letting n — ov, the right member — 0 by hypothesis, hence so does the left, 
which is equivalent to (5) with X = 0. This proves the first assertion. If now 
IX,| < Y ae. with E(Y?) < oo, then we have 


E(\Xnl”) -| xalrar+ | [Xn|?dP < e+ | YP dP. 
(Xn1<e] (X,26) (Xn l2€] 


Since /{|X,| >«¢}— 0, the last-written integral > 0 by Exercise 2 in 
Sec. 3.2. Letting first n —> co and then € > 0, we obtain ¢(|X,,|?) — 0; hence 
X, converges to 0 in L?. 


As a corollary, for a uniformly bounded sequence {X,,} convergence in 
pr. and in L? are equivalent. The general result is as follows. 


Theorem 4.1.5. X, — 0 in pr. if and only if 


1 1X 
£ {| ——_— 0. 
(10) ‘(Sg > 


Furthermore, the functional p(-, -) given by 


(Xx r= «( IX = YI ) 
Pans FTE ON TIX LY] 
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is a metric in the space of r.v.’s, provided that we identify r.v.’s that are 
equal a.e. 


PROOF. If p(X,¥)=0, then &(|X—Y!)=0, hence X=Y ae. by 
Exercise 1 of Sec. 3.2. To show that (., -) is metric it is sufficient to show that 


Ix —Y| ) ( |X | ) ( [Y| ) 
{| ——__—_ ] < & + & . 
(Cae 1+ |X| 1+\¥| 


For this we need only verify that for every real x and y: 


|x + y| 2 |x| ly| 


(11) < : 
1+ (x+y 1+ {x} 144]y| 


By symmetry we may suppose that | y| < |x|; then 


Ixtyl a ” [x + y| — |x| 
1+ |xt+tyl 14+ |x| (1 + [x + yf) + [x}) 


eles ly| 
1 + |x| 1+ |y| 


For any X the r.v. |X|/(1 + |X|) is bounded by 1, hence by the second 
part of Theorem 4.1.4 the first assertion of the theorem will follow if we show 
that |X,| — 0 in pr. if and only if |X,{/(1 + |X,|) > 0 in pr. But |x| < € is 
equivalent to |x|/(1 + |x|) < €/(1 + €); hence the proof is complete. 


(12) 


Example 1. Convergence in pr. does not imply convergence in L?, and the latter 
does not imply convergence a.e. 

Take the probability space (2,4%,f) to be (W%,B,m) as in Example 2 of 
Sec. 2.2. Let g ; be the indicator of the interval 


Order these functions lexicographically first according to k increasing, and then for 
each k according to j increasing, into one sequence {X,,} so that if X, = g;,;,, then 
ky — 0© as n — oo. Thus for each p > 0: 


and so X,, — 0 inL?. But for each w and every k, there exists a j such that gy ;(w) = 1; 
hence there exist infinitely many values of n such that X,(w) = 1. Similarly there exist 
infinitely many values of n such that X,,(w) = 0. It follows that the sequence {X,(w)} 
of 0’s and 1’s cannot converge for any w. In other words, the set on which {X,} 
converges is empty. 
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Now if we replace g; by k'/’g,;, where p > 0, then 7{X, > 0} = 1/k, > 
0 so that X, — 0 in pr., but for each n, we have &(X,’”) = 1. Consequently 
lim, & (|X, — O|?) = 1 and X,, does not — 0 in L?. 


Example 2. Convergence a.e. does not imply convergence in L’. 
In (7, A, m) define 


1 
2”, if 0, -— > 
tio)= | ee ( 7) 


0, otherwise. 


Then &(|X,,|?) = 2"?/n — +00 for each p > 0, but X,, > 0 everywhere. 


The reader should not be misled by these somewhat “artificial” examples 
to think that such counterexamples are rare or abnormal. Natural examples 
abound in more advanced theory, such as that of stochastic processes, but 
they are not as simple as those discussed above. Here are some culled from 
later chapters, tersely stated. In the symmetrical Bernoullian random walk 
{S,,n > 1}, let &, = lys, =o}. Then limyo &(¢?) = 0 for every p > 0, but 
AP{liMy—+co fn Exists} = O because of intermittent return of S, to the origin (see 
Sec. 8.3). This kind of example can be formulated for any recurrent process 
such as a Brownian motion. On the other hand, if {2,, > 1} denotes the 
random walk above stopped at 1, then &(¢,) = 0 for all n but P{lim,_.0 f, = 
1} = 1. The same holds for the martingale that consists in tossing a fair 
coin and doubling the stakes until the first head (see Sec. 9.4). Another 
striking example is furnished by the simple Poisson process {N(t), t > 0} (see 
Sect. 5.5). If ¢(t) = N(t)/t, then &(f(t)) = A for all t > 0; but P{limyyo f(t) = 
O} = 1 because for almost every w, N(t, w) = 0 for all sufficiently small values 
of t. The continuous parameter may be replaced by a sequence ¢, | 0. 

Finally, we mention another kind of convergence which is basic in func- 
tional analysis, but confine ourselves to L!. The sequence of r.v.’s {X,} in L} 
is said to converge weakly in L' to X iff for each bounded r.v. Y we have 


lim ¢(X,Y) = &(XY), finite. 

n> OO 
It is easy to see that X € L' and is unique up to equivalence by taking 
Y = liyzxy if X’ is another candidate. Clearly convergence in L' defined 
above implies weak convergence; hence the former is sometimes referred to 
as “strong”. On the other hand, Example 2 above shows that convergence 
a.e. does not imply weak convergence; whereas Exercises 19 and 20 below 
show that weak convergence does not imply convergence in pr. (nor even in 
distribution; see Sec. 4.4). 
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1. X, > +0 ae. if and only if VM > 0: 7{X, < M i.o.} =0. 
2. If0< Xn, X, <X €L! and X, — X in pr., then X, > X in L!. 
3. If X, > X, Y, > Y both in pr., then X, 4 Y, > XtY,X,Y, > 
XY, all in pr. 
*4. Let f be a bounded uniformly continuous function in %!. Then X,, > 
O in pr. implies ¢{f(X,)} > f (0). [Example: 


|X| 
1+ |X| 


ff) = 


as in Theorem 4.1.5.] 

5. Convergence in L? implies that in L’ for r < p. 

6. If X, > X, Y, — Y, both in L?, then X, +Y, > X+Y in L?. If 
X, — X in L? and Y, — Y in L?, where p > 1 and 1/p+1/q =1, then 
Xn¥n > XY in L}. 

7. If X, — X in pr. and X, — Y in pr., then X = Y ae. 

8. If X, — X a.e. and wu, and w are the p.m.’s of X, and X, it does not 
follow that u,(P) — u(P) even for all intervals P. 

9. Give an example in which &(X,,) > 0 but there does not exist any 
subsequence {n,} — oo such that X,,, — 0 in pr. 

*10. Let f be a continuous function on &!. If X, > X in pr., then 
f (Xn) > f(X) in pr. The result is false if f is merely Borel measurable. 
[HINT: Truncate f at +A for large A.] 

11. The extended-valued r.v. X is said to be bounded in pr. iff for each 
€ > 0, there exists a finite M(e) such that P{|X| < M(e)} > 1 —. Prove that 
X is bounded in pr. if and only if it is finite a.e. 

12. The sequence of extended-valued r.v. {X,,} is said to be bounded in 
pr. iff sup,, |X,| is bounded in pr.; {X,} is said to diverge to +00 in pr. iff for 
each M > 0 and € > O there exists a finite no(M, €) such that if n > no, then 
P{|X,| > M} > 1—€e. Prove that if {X,} diverges to +00 in pr. and {Y,,} is 
bounded in pr., then {X,, + Y,,} diverges to +00 in pr. 

13. If sup, X, = +00 a.e., there need exist no subsequence {X,,,} that 
diverges to +00 in pr. 

14. It is possible that for each w, lim,X,(w) = +00, but there does 
not exist a subsequence {n;,} and a set A of positive probability such that 
lim, Xpn,(@) = +00 on A. [HInT: On (7%, 4) define X,,(w) according to the 
nth digit of w.] 
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*15. Instead of the p in Theorem 4.1.5 one may define other metrics as 
follows. Let p;(X, Y) be the infimum of all € > 0 such that 


PUX —Y|>e6) <e. 


Let p(X, Y) be the infimum of A{|X — Y| > €} + over all e > 0. Prove 
that these are metrics and that convergence in pr. is equivalent to convergence 
according to either metric. 


*16. Convergence in pr. for arbitrary r.v.’s may be reduced to that of 
bounded r.v.’s by the transformation 


X’ = arctanX. 


In a space of uniformly bounded r.v.’s, convergence in pr. is equivalent to 
that in the metric po(X, Y) = &(|X — Y|); this reduces to the definition given 
in Exercise 8 of Sec. 3.2 when X and Y are indicators. 

17. Unlike convergence in pr., convergence a.e. is not expressible 
by means of metric. [HINT: Metric convergence has the property that if 
P(X%n,x)-+> 0, then there exist € > 0 and {nz} such that p(x,,,x) > for 
every k.] 

18. If X, | X as., each X, is integrable and inf, <(X,) > —oo, then 
X, > Xin L!. 

19. Let f,(x) = 1+ cos 2znx, f(x) = 1 in [0, 1]. Then for each g € L! 


[O, 1] we have 
1 1 
[ frsax> | teas, 
0 0 


but f, does not converge to f in measure. [HINT: This is just the 
Riemann—Lebesgue Jemma in Fourier series, but we have made f, > 0 to 
stress a point.] 

20. Let {X,,} be a sequence of independent r.v.’s with zero mean and 
unit variance. Prove that for any bounded r.v. Y we have lim,-... (X,Y) = 0. 
[Hint: Consider &{[Y — }°7_, &(Xz¥)X;]’} to get Bessel’s inequality ¢(¥7) > 
rel &(X;Y)*. The stated result extends to case where the X,,’s are assumed 
only to be uniformly integrable (see Sec. 4.5) but now we must approximate 
Y by a function of (X),...,X»), cf. Theorem 8.1.1.] 


4.2 Almost sure convergence; Borel—Cantelli lemma 
An important concept in set theory is that of the “lim sup” and “lim inf” of 


a sequence of sets. These notions can be defined for subsets of an arbitrary 
space Q2. 
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DEFINITION. Let E,, be any sequence of subsets of Q; we define 


o oO lo @) C 
lim sup FE, = () U E,, liminf E,, = U a E,,. 


m=|n=m m=|n=m 


Let us observe at once that 


(1) lim inf E,, = (lim sup E*)°, 


so that in a sense one of the two notions suffices, but it is convenient to 
employ both. The main properties of these sets will be given in the following 
two propositions. 


(i) A point belongs to limsup, E,, if and only if it belongs to infinitely 
many terms of the sequence {E,,n > 1}. A point belongs to liminf, EF, if 
and only if it belongs to all terms of the sequence from a certain term on. 
It follows in particular that both sets are independent of the enumeration of 
the E,,’s. 


PROOF. We shall prove the first assertion. Since a point belongs to 
infinitely many of the E,,’s if and only if it does not belong to all E© from 
a certain value of n on, the second assertion will follow from (1). Now if w 
belongs to infinitely many of the E,,’s, then it belongs to 


oe) 
Fy= U Ey for every m; 
=m 


hence it belongs to 
CO 


Fy, = limsup E,. 
m=] n 


Conversely, if w belongs to (\P__, Fm, then w € F, for every m. Were w to 


belong to only a finite number of the E£,,’s there would be an m such that 
w¢E,, for n > m, so that 


CO 
o€ J En = Fm. 


nAa=m 
This contradiction proves that w must belong to an infinite number of the E,,’s. 


In more intuitive language: the event lim sup, E,, occurs if and only if the 
events E, occur infinitely often. Thus we may write 


Pim sup E,) = P(E, 1.0.) 
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where the abbreviation “i.o.” stands for “infinitely often”. The advantage of 
such a notation is better shown if we consider, for example, the events “|X,,| > 
e” and the probability 7{|X,,| > € i.o.}; see Theorem 4.2.3 below. 

(ii) If each E,, € 4, then we have 


CO 

(2) P(lim supE,) = lim 7 (U E.| 
n M-> OO en 
lo @) 

(3) AP(liminfE,)= lim 7 ( () E 
n ma OO pa 


PROOF. Using the notation in the preceding proof, it is clear that F,, 
decreases as m increases. Hence by the monotone property of p.m.’s: 


ioe) 
Pp a Fe) = lim PF mn) 
m=1 
which is (2); (3) is proved similarly or via (1). 
Theorem 4.2.1. We have for arbitrary events {E,}: 


(4) S> P(En) < © > P(E; io.) = 0. 


n 


PROOF. By Boole’s inequality for p.m.’s, we have 


En) = > Pew: 


Aa=m 


Hence the hypothesis in (4) implies that ?(F,,) — 0, and the conclusion in 
(4) now follows by (2). 


As an illustration of the convenience of the new notions, we may restate 
Theorem 4.1.1 as follows. The intuitive content of condition (5) below is the 
point being stressed here. 


Theorem 4.2.2. X, — 0 ae. if and only if 
(5) Ves 0:7 {|[X,,) > eho.) =0: 


PROOF. Using the notation Aj» = (\p-,{|Xnl < €} as in (3) of Sec. 4.1 
(with X = 0), we have 


(X,)>eio}=() Lteil>d= () AS. 


m=] n=m m=1 


i TS 
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According to Theorem 4.1.1, X, - 0 ae. if and only if for each ¢« > 0, 
PAs) > 0 as m — oo; since Af, decreases as m increases, this is equivalent 
to (5) as asserted. 


Theorem 4.2.3. If X, — X in pr., then there exists a sequence {n,;} of inte- 
gers increasing to infinity such that X,, — X a.e. Briefly stated: convergence 
in pr. implies convergence a.e. along a subsequence. 


PROOF. We may suppose X = 0 as explained before. Then the hypothesis 
may be written as 


1 
Vk > 0: lim P (1% > a) = 0. 
noo Dk 


It follows that for each k we can find n; such that 


and consequently 
1 1 
k k 
Having so chosen {nx}, we let Ey be the event “|X, | > 1/2*”. Then we have 


by (4): ; 


[Note: Here the index involved in “i.o.” is k; such ambiguity being harmless 
and expedient.] This implies X,, — 0 a.e. (why?) finishing the proof. 


EXERCISES 


1. Prove that 


P (lim sup E,) > lim P(E, ), 
n n 


Aim inf £,) < lim A(E, ). 


2. Let {B,} be a countable collection of Borel sets in 7/. If there exists 
a 6 > 0 such that m(B,,) > 6 for every n, then there is at least one point in 7/ 
that belongs to infinitely many B,,’s. 

3. If {X,} converges to a finite limit a.e., then for any ¢€ there exists 
M(e) < © such that A{sup |X,,| < M(e)} => 1—e. 
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*4. For any sequence of r.v.’s {X,,} there exists a sequence of constants 
{A,} such that X,/A, > 0 ae. 


*5. If {X,} converges on the set C, then for any € > 0, there exists 
Co CC with A(C\Co) < € such that X, converges uniformly in Co. [This 
is Egorov’s theorem. We may suppose C = 2 and the limit to be zero. Let 
Fk = (\e_,{@: | Xn(@) < 1/m}; then Wm, 3k(m) such that 


P(Frmdim) > 1 — €/2". 


Take Co = pee F ikinyel 
6. Cauchy convergence of {X,,} in pr. (or in L”) implies the existence of 
an X (finite a.e.), such that X,, converges to X in pr. (or in L?). [HInT: Choose 


n; so that 
1 
LP {Bn —Xn,| > =} < 00; 
k 


cf. Theorem 4.2.3.] 

*7, {X,} converges in pr. to X if and only if every subsequence 
{X,,} contains a further subsequence that converges a.e. to X. [HINT: Use 
Theorem 4.1.5.] 

8. Let {X,,n > 1} be any sequence of functions on Q to RF and let 
C denote the set of w for which the numerical sequence {X,(@),n > 1} 
converges. Show that 


c=MU 0) {u-as2}=AY 


m=| n=] n'=n+1 m=ln=1n' 


ee n,n’) 


$8 


where 


1 
NG n,n')= e max |X j;(w) — X;(w)| < =} . 
n<j<k<n' m 


Hence if the X,,’s are r.v.’s, we have 


P(C) = lim lim lim P (Am n,n »). 


M>ON> OO n'>00 


9. As in Exercise 8 show that 


_ 8 


(w: lim X,(@) =0}= (|) Uf) {Bul < See -}. 


a= 


i 

= 
pa 
a 


10. Suppose that for a < b, we have 


PX, <aio. and X, > bi.o.} = 0; 


SN 
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then lim,+oc Xn exists a.e. but it may be infinite. [HmInT: Consider all pairs of 
rational numbers (a, b) and take a union over them.] 


Under the assumption of independence, Theorem 4.2.1 has a striking 
complement. 


Theorem 4.2.4. If the events {E,,} are independent, then 
(6) > PEn) = 0 > P(Ey 1.0.) = 1. 


n 


PROOF. By (3) we have 


oO 
(7) P{lim inf E6} = lim P (1 Fs) , 
n Mao n=m 
The events {E>} are independent as well as {E,}, by Exercise 3 of 
Sec. 3.3; hence we have if m’ > m: 


P (N F:| = |] 7) = [[a-7@,)). 


na=m nA=m 


Now for any x > 0, we have 1 — x < e™*; it follows that the last term above 
does not exceed 


A=>m 


Tl e 7En) — exp (- 3 P(En ) . 


Letting m'— oo, the right member above — 0, since the series in the 
exponent — +00 by hypothesis. It follows by monotone property that 


oe) m 
a(n F) = lim “(A Fs = 0. 
=m m—> Oo n=m 


Thus the right member in (7) is equal to 0, and consequently so is the left 
member in (7). This is equivalent to P(E, io.) = 1 by (1). 


Theorems 4.2.1 and 4.2.4 together will be referred to as the 
Borel—Cantelli lemma, the former the “convergence part” and the latter “the 
divergence part”. The first is more useful since the events there may be 
completely arbitrary. The second has an extension to pairwise independent 
r.v.’s; although the result is of some interest, it is the method of proof to be 
given below that is more important. It is a useful technique in probability 
theory. 
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Theorem 4.2.5. The implication (6) remains true if the events {£,,} are pair- 
wise independent. 


proor. Let J,, denote the indicator of E,,, so that our present hypothesis 
becomes 


(8) vm # n. EU mln) = EUmEUn). 


Consider the series of r.v.’s: }>>./,(@). It diverges to +00 if and only if 
an infinite number of its terms are equal to one, namely if w belongs to an 
infinite number of the E,,’s. Hence the conclusion in (6) is equivalent to 


(9) {SI = +00] =1. 
n=] 


What has been said so far is true for arbitrary E,,’s. Now the hypothesis in 
(6) may be written as 


Me 


EUn) = +O. 


3 
Il 
» 


Consider the partial sum J; = yo I,. Using Chebyshev’s inequality, we 
have for every A > 0: 


2 « o* (Je) 1 
(10) P{\Je — ET) < AOVI;)} 2 1 RoI) 1— 


where o*(J) denotes the variance of J. Writing 
Pn = EUn) = PEn), 
we may calculate o*(J,) by using (8), as follows: 


‘(Sn S- In 


l<m<n<k 


ES?) 


k 


n=l l<m<n<k 


k 
= =>. Env +2 S> EUm)ETn) + Y{EEn) = EMnY’} 
n=1 


l<m<n<k 


k 
= (> rs) + Sir pr). 
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Hence 
k k 
07 (Jn) = EI5) — EU) = Spa = Pr) - ye aed, ] 
n=1 n=1 


This calculation will turn out to be a particular case of a simple basic formula; 
see (6) of Sec. 5.1. Since ee Pn = EV) > ©, it follows that 


on) < EUx)? = of Ex) 


in the classical ““o, O” notation of analysis. Hence if k > ko(A), (10) implies 
PJ l ecg yp >] - 
(o, > — — 
ea a ae 


(where 5 may be replaced by any constant < 1). Since J; increases with k 


the inequality above holds a fortiori when the first J; there is replaced by 
limy-+o0 J;; after that we can let k > oo in &(J;) to obtain 


1 
P( jim J, = +00) = 1-55. 


Since the left member does not depend on A, and A is arbitrary, this implies 
that limy~. J; = +00 a.e., namely (9). 


Corollary. If the events {Z,,} are pairwise independent, then 


Ff (lim sup E,) = 0 or 1 
according as )), P(E,) < 00 or = ©. 


This is an example of a “zero-or-one” law to be discussed in Chapter 8, 
though it is not included in any of the general results there. 


EXERCISES 


Below X,, Y, are r.v.’s, E, events. 

11. Give a trivial example of dependent {E,,} satisfying the hypothesis 
but not the conclusion of (6); give a less trivial example satisfying the hypoth- 
esis but with /(lim sup, E,,) = 0. [Hint: Let E, be the event that a real number 
in [0, 1] has its n-ary expansion begin with 0.] 

*12. Prove that the probability of convergence of a sequence of indepen- 
dent r.v.’s is equal to zero or one. 

13. If {X,,} is a sequence of independent and identically distributed r.v.’s 
not constant a.e., then /{X,, converges} = 0. 
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*14, If {X,} is a sequence of independent r.v.’s with d.f.’s {F,}, then 
PAllim, X, = 0} = 1 if and only if V. > 0:30, {1 — Fn) + Fn(-e)} < ©. 
15. If }>, AU Xn| > n) < 0, then 


[Xn 


lim sup —— < 1 ae. 
n n 


*16. Strengthen Theorem 4.2.4 by proving that 


im =) ae: 
n> &(J, 


[HInT: Take a subsequence {k,} such that &(VJn,) ~ k*; prove the result first 
for this subsequence by estimating P{|J; — &Vx)| > b&(J;)}; the general 
case follows because if np <n < ng41, 


Jn, Lé I ngsy) = JnfEUn) < A Py RACE TOS 
17. If &(X,) = 1 and &(X?) is bounded in n, then 
Al Tim X, >1}>0. 
n> oOo 


[This is due to Kochen and Stone. Truncate X, at A to Y, with A so large 
that £(Y,,) > 1 —e for all n; then apply Exercise 6 of Sec. 3.2.] 


*18. Let {E,} be events and {/,,} their indicators. Prove the inequality 


» oGe)e((Eoh 8) 


Deduce from this that if (i) 5>, A(E,) = oe and (ii) there exists c > 0 such 
that we have 
Ym <n: PEmEn) < cPEm)PEn-m); 


then 
Pi{lim sup E,} > 0. 
19. If 3), A(E,) = 00 and 
n 1 n 2 
lim { 5) S| PEjEx) / {y7e0} =1, 
k=1 


"| j=l k=] 


then /{lim sup, E,} = 1. [Hint: Use (11) above.] 


ee A 
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20. Let {£,,} be arbitrary events satisfying 
(i) limA?E,)=0, i) J PEnE N41) < 005 


then /{lim sup,, E,,} = 0. [This is due to Barndorff—Nielsen.] 


4.3. Vague convergence 


If a sequence of r.v.’s {X,,} tends to a limit, the corresponding sequence of 
p-m.’s {,,} ought to tend to a limit in some sense. Is it true that lim, 1, (A) 
exists for all A € #! or at least for all intervals A? The answer is no from 
trivial examples. 


Example 1. Let X, = c, where the c,’s are constants tending to zero. Then X, > 0 
deterministically. For any interval J such that 0 ¢ I, where I is the closure of J, we 
have lim, ,(/) = 0 = (J); for any interval such that 0 € J°, where J° is the interior 
of J, we have lim, u, (2) = 1 = uw). But if {c,} oscillates between strictly positive 
and strictly negative values, and J = (a, 0) or (0,b), where a <0 <5, then py, (J) 
oscillates between 0 and 1, while ~(/) = 0. On the other hand, if J = (a, 0] or [0, b), 
then j2, (I) oscillates as before but u(J) = 1. Observe that {0} is the sole atom of yu 
and it is the root of the trouble. 

Instead of the point masses concentrated at c,, we may consider, e.g., r.v.’s {X,} 
having uniform distributions over intervals (c,, cj,) where c, <0O<c!, and c, — 0, 
ci, ~> 0. Then again X, — 0 ae. but yz, ((a, 0)) may not converge at all, or converge 
to any number between 0 and 1. 

Next, even if {iz,,} does converge in some weaker sense, is the limit necessarily 
a p.m.? The answer is again no. 


Example 2. Let X, =c, where c, —~ +00. Then X, — +00 deterministically. 
According to our definition of a r.v., the constant +00 indeed qualifies. But for 
any finite interval (a,b) we have lim, ,((a, b)) = 0, so any limiting measure must 
also be identically zero. This example can be easily ramified; e.g. let a, — --00, 
b, — +00 and 

an with probability a, 

xX, =< 0 with probability 1 — a — B, 
b, with probability £. 


Then X,, — X where 


+00 with probability , 
X= <0 with probability 1 — a — B, 
—oo with probability £. 


For any finite interval (a, b) containing 0 we have 


lim pt, ((a, b)) = lim p, ({0}) = 1 —a — B. 
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In this situation it is said that masses of amount a and # “have wandered off to +00 
and —oo respectively.” The remedy here is obvious: we should consider measures on 
the extended line Z* = [—00, +00], with possible atoms at {+00} and {—oo}. We 
leave this to the reader but proceed to give the appropriate definitions which take into 
account the two kinds of troubles discussed above. 


DEFINITION. A measure uw on (%!, B') with (ZR!) < 1 will be called a 
subprobability measure (s.p.m.). 


DEFINITION OF VAGUE CONVERGENCE. <A sequence {,,n > 1} of s.p.m.’s is 
said to converge vaguely to an s.p.m. yu iff there exists a dense subset D of 
RF such that 


(1) VaeD,beD,a<b: Ln ((a, b}) > (Ca, 5)). 
This will be denoted by 
(2) Mn UE 


and y is called the vague limit of {4}. We shall see that it is unique below. 

For brevity’s sake, we will write z((a, b}) as (a, b] below, and similarly 
for other kinds of intervals. An interval (a, b) is called a continuity interval 
of yu iff neither a nor b is an atom of yw; in other words iff (a, b) = ula, bj. 
As a notational convention, (a, b) = 0 when a > b. 


Theorem 4.3.1. Let {u,,} and yz be s.p.m.’s. The following propositions are 
equivalent. 

(i) For every finite interval (a, b) and € > 0, there exists an no(a, b, €) 
such that if m > no, then 


(3) u(a+e,b—€)-€ <u,(a,b)< ula—-e,b+e) +e. 


Here and hereafter the first term is interpreted as 0 ifa+e>b—e. 
(ii) For every continuity interval (a, b] of 4, we have 


[in (a, b] > (a, 8). 
(iii) in > HL. 
PROOF. To prove that (i) => (ii), let (a, b) be a continuity interval of yu. 
It follows from the monotone property of a measure that 


lim ula t+ ¢,b— ©) = “(a, b) = w[a, b) = lim p(a—€,b+.€). 
€10 €10 


Letting n — oo, then € | 0 in (3), we have 


u(a, b) < lim, (a, b) < limy,[a, b) < ula, b] = ua, 5), 
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which proves (ii); indeed (a, b] may be replaced by (a, b) or [a, b) or [a, ] 
there. Next, since the set of atoms of 4 is countable, the complementary set 
D is certainly dense in #!. If a € D, b € D, then (a, b) is a continuity interval 
of «. This proves (ii) = (iii). Finally, suppose (iii) is true so that (1) holds. 
Given any (a, b) and € > 0, there exist a), a2, b;, bz all in D satisfying 


a-€<a,<a<a<a+te, b—-e<b <b<b <b+e. 
By (1), there exists mg such that if n > no, then 
[én (aj, bj] — ula, by]| < € 
fori = 1,2 and j = 1, 2. It follows that 
U(at+e,b—€)—€ < Ua, b}] — € < Mn (@, Di] S Mn (G, b) S Un (ai, b2] 
< u(aj, bo) +E < w(a-—e,b+e) +. 
Thus (ili) => (i). The theorem is proved. 


As an immediate consequence, the vague limit is unique. More precisely, 
if besides (1) we have also 


VaeéD',be Da < bi, (a, b] > p'(a, bl), 


then = uw’. For let A be the set of atoms of uw and of pw’; then if a € A‘, 
b € A‘°, we have by Theorem 4.3.1, (ii): 


(a, b] <— py (a, b] > ya, b] 


so that (a, b] = w’(a, b}. Now A* is dense in %!, hence the two measures jz 
and yw’ coincide on a set of intervals that is dense in 2! and must therefore 
be identical (see Corollary to Theorem 2.2.3). 


Another consequence is: if i, ~ j and (a, b) is a continuity interval 
of yw, then u,(7) > uw), where J is any of the four kinds of intervals with 
endpoints a, b. For it is clear that we may replace pu, (a, b) by uz [a, b] in (3). 
In particular, we have i, ({a}) > 0, un ({b}) > 0. 

The case of strict probability measures will now be treated. 


Theorem 4.3.2. Let {u,} and ut be p.m.’s. Then (i), (ii), and (iii) in the 
preceding theorem are equivalent to the following “uniform” strengthening 
of (1). 


(i') For any 6 > 0 and e€ > 0, there exists no(d, €) such that if n > no 
then we have for every interval (a, b), possibly infinite: 


(4) u(a+6,b—6)—€ < un(a, b) < u(a—6,b +4) +€. 


— inerseiunneeiemmemensemammenmmel 
SE 
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PROOF. The employment of two numbers 6 and ¢€ instead of one € as in (3) 
is an apparent extension only (why?). Since (i’) => (i), the preceding theorem 
yields at once (i') => (ii) < (iii). It remains to prove (ii) > (i’) when the p,,’s 
and w are p.m.’s. Let A denote the set of atoms of uw. Then there exist an 
integer £ and a; € A‘, 1 < j < £, satisfying: 


aj < Gj41 S aj t+, 1<j<f-1; 
and 


(5) w((ay, ae)°) < 7 


By (11), there exist np depending on ¢€ and £ (and so on € and 5) such that if 
n > no then 


€ 
(6) sup |u(a;, aj+1] — Bn (a;, aj+il| beret 
1<j<é-1 Ag 


It follows by additivity that 


€ 
|(ay, ar] — Kn (a1, az]| < 4 


and consequently by (5): 


(7) Ln ((a1, ae)°) < - 


From (5) and (7) we see that by ignoring the part of (a, D) outside (a;, ag), we 
commit at most an error <e/2 in either one of the inequalities in (4). Thus it 
is sufficient to prove (4) with 6 and €/2, assuming that (a, b) C (ay, ag). Let 
then aj < a < aj4; and ay < b < ay41, whereO < j < k < £ — 1. The desired 
inequalities follow from (6), since when n > ng we have 


€ € 
[Ly (a + 6, b a. 5) a 4 < Ln (AQj41, ax) —™ 4 < L(aj415 ax) < La, b) 
€ 
S (Qj, O41) S Mn (Gj, O41) + mi 
€ 
Sal 8, 00) 
The set of all s.p.m.’s on ! bears close analogy to the set of all 
real numbers in [0, 1]. Recall that the latter is sequentially compact, which 
means: Given any sequence of numbers in the set, there is a subsequence 


which converges, and the limit is also a number in the set. This is the 
fundamental Bolzano—Weierstrass theorem. We have the following analogue 


a et 
ee 
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which states: The set of all s.p.m.’s is sequentially compact with respect to 
vague convergence. It is often referred to as “Helly’s extraction (or selection) 
principle”. 


Theorem 4.3.3. Given any sequence of s.p.m.’s, there is a subsequence that 
converges vaguely to an s.p.m. 


PROOF. Here it is convenient to consider the subdistribution function 
(s.d.f.) F,, defined as follows: 


Vx: Fy, (x) = Un(—0O, x]. 


If 4, is ap.m., then F,, is just its d.f. (see Sec. 2.2); in general F’, is increasing, 
right continuous with F,,(—oo) = 0 and F,(4+00) = Ly, (FR) <1. 

Let D be a countable dense set of &!, and let {j,k > 1} be an 
enumeration of it. The sequence of numbers {F,,(r,), 1 > 1} is bounded, hence 
by the Bolzano—Weierstrass theorem there is a subsequence {Fjz,k > 1} of 
the given sequence such that the limit 


lim Fix(n) = £; 
k->0o 


exists; clearly 0 < £; < 1. Next, the sequence of numbers {F',(r2), k > 1} is 
bounded, hence there is a subsequence {F'2,,k > 1} of {Fyx, k = 1} such that 


lim F4(r2) = £2 
k-> 0 


where 0 < £2 < 1. Since {F'2;} is a subsequence of {F'),}, it converges also at 
r, to £,. Continuing, we obtain 


Fy, Fy2,.--, Fig... converging at 71; 
F 1, Fr, ..., Fox, ... converging at 71, r23 
Fy, Fya,...5 Figs... converging at 7}, 72,..-,7;3 


Now consider the diagonal sequence {F'4, k > 1}. We assert that it converges 
at every rj, j => 1. To see this let r; be given. Apart from the first j — 1 terms, 
the sequence {Fy,k > 1} is a subsequence of {Fj,,k = 1}, which converges 
at rj and hence limy_o. Fy(rj) = £;, as desired. 

We have thus proved the existence of an infinite subsequence {n,} and a 
function G defined and increasing on D such that 


Vr € D: lim Fy, (r) = G(r). 
k-00 


From G we define a function F on &! as follows: 


Wx Ee #':F(x) = inf G(r). 


x<reD 


i 
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By Sec. 1.1, (vii), F is increasing and right continuous. Let C denote the set 
of its points of continuity; C is dense in 2! and we show that 


(8) Vx €C: lim F,, (x) = FQ). 
k> 0 


For, let x € C and € > 0 be given, there exist r, r’, and r” in D such that 
r<r <x <r" and F(r’) — F(r) < €. Then we have 


F(r)< G0) < F@) <G0")< FC") <Ft+e 


wf | 


Fn’) < Fr, (x) < Fu, (r"). 


From these (8) follows, since ¢ is arbitrary. 
To F corresponds a (unique) s.p.m. jz such that 


F(x) — F(—o0) = u(—on, x] 
as in Theorem 2.2.2. Now the relation (8) yields, upon taking differences: 


VaeC,beC,a< b: lim Ln, (a, b] = w(a, B). 
> 0 


Thus [n, + 4, and the theorem is proved. 


We say that F,, converges vaguely to F and write F, —> F for py, —> 
where 2, and yw are the s.p.m.’s corresponding to the s.d.f.’s F, and F. 

The reader should be able to confirm the truth of the following proposition 
about real numbers. Let {x,} be a sequence of real numbers such that every 
subsequence that tends to a limit (too allowed) has the same value for the 
limit; then the whole sequence tends to this limit. In particular a bounded 
sequence such that every convergent subsequence has the same limit is 
convergent to this limit. 

The next theorem generalizes this result to vague convergence of s.p.m.’s. 
It is not contained in the preceding proposition but can be reduced to it if we 
use the properties of vague convergence; see also Exercise 9 below. 


Theorem 4.3.4. If every vaguely convergent subsequence of the sequence 
of s.p.m.’s {{4,} converges to the same yp, then py, + [L. 


PROOF. To prove the theorem by contraposition, suppose 4, does not 
converge vaguely to yz. Then by Theorem 4.3.1, (ii), there exists a continuity 
interval (a,b) of yw such that y,(a,b) does not converge to p(a, b). By 
the Bolzano—Weierstrass theorem there exists a subsequence {n,} tending to 
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infinity such that the numbers jz, (a, b) converge to a limit, say L 4 u(a, b). 
By Theorem 4.3.3, the sequence {,,,k > 1} contains a subsequence, say 
(Hy! k > 1}, which converges vaguely, hence to uw by hypothesis of the 
theorem. Hence again by Theorem 4.3.1, (ii), we have 


Hy! (a, b) > wa, 6). 


But the left side also — L, which is a contradiction. 


EXERCISES 


1. Perhaps the most logical approach to vague convergence is as follows. 
The sequence {u,,” > 1} of s.p.m.’s is said to converge vaguely iff there 
exists a dense subset D of &! such that for every a € D, b € D, a < b, the 
sequence {4,, (a, b), n => 1} converges. The definition given before implies this, 
of course, but prove the converse. 

2. Prove that if (1) is true, then there exists a dense set D’, such that 
[tn(1) — () where J may be any of the four intervals (a, b), (a, b], [a, 5), 
[a,b] wihae D’,be D’. 

3. Can a sequence of absolutely continuous p.m.’s converge vaguely to 
a discrete p.m.? Can a sequence of discrete p.m.’s converge vaguely to an 
absolutely continuous p.m.? 

4. If a sequence of p.m.’s converges vaguely to an atomless p.m., then 
the convergence is uniform for all intervals, finite or infinite. (This is due to 
Pélya.) 

5. Let {f,} be a sequence of functions increasing in #! and uniformly 
bounded there: sup, » | fn(«)| < M < oo. Prove that there exists an increasing 
function f on &! and a subsequence {n,;} such that f,,(x) > f(x) for every 
x. (This is a form of Theorem 4.3.3 frequently given; the insistence on “every 
x” requires an additional argument.) 

6. Let {u,} be a sequence of finite measures on %!. It is said to converge 
vaguely to a measure py iff (1) holds. The limit yw is not necessarily a finite 
measure. But if u,(Z') is bounded in n, then p is finite. 

7. If Z, is a sequence of p.m.’s on (22, ¥) such that Y,(E) converges 
for every E € #, then the limit is a p.m. Y. Furthermore, if f is bounded 
and ¥ -measurable, then 


[tam [ rae. 


(The first assertion is the Vitali-Hahn—Saks theorem and rather deep, but it 
can be proved by reducing it to a problem of summability; see A. Rényi, [24]. 
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8. If w, and u are p.m.’s and u,(E) > w(E£) for every open set E, then 
this is also true for every Bore! set. [HInT: Use (7) of Sec. 2.2.] 

9. Prove a convergence theorem in metric space that will include both 
Theorem 4.3.3 for p.m.’s and the analogue for real numbers given before the 
theorem. [HINT: Use Exercise 9 of Sec. 4.4.] 


4.4 Continuation 


We proceed to discuss another kind of criterion, which is becoming ever more 
popular in measure theory as well as functional analysis. This has to do with 
classes of continuous functions on &!. 


Cr = the class of continuous functions f each vanishing outside a 
compact set K(f); 
Co = the class of continuous functions f such that 


lim|zj+00 f (x) = 0; 


Cp = the class of bounded continuous functions; 
C = the class of continuous functions. 


We have Cx C Co C Cg C C. It is well known that Co is the closure of Cx 
with respect to uniform convergence. 

An arbitrary function f defined on an arbitrary space is said to have 
support in a subset S of the space iff it vanishes outside S. Thus if f € Cx, 
then it has support in a certain compact set, hence also in a certain compact 
interval. A step function on a finite or infinite interval (a, b) is one with support 
in it such that f(x) =c, for x € (aj, aj41) for l <j £, where @ is finite, 
a=a; <---<ap=b, and the c,;’s are arbitrary real numbers. It will be 
called D-valued iff all the a;’s and c;’s belong to a given set D. When the 
interval (a,b) is %!, f is called just a step function. Note that the values of 
f at the points a; are left unspecified to allow for flexibility; frequently they 
are defined by right or left continuity. The following lemma is basic. 


Approximation Lemma. Suppose that f € Cx has support in the compact 
interval [a, b]. Given any dense subset A of R' and €« > 0, there exists an 
A-valued step function f, on (a, b) such that 

(1) sup | f(x) — fe(x)| <€. 


xER! 


If f € Co, the same is true if (a, b) is replaced by Ri, 


This lemma becomes obvious as soon as its geometric meaning is grasped. 
In fact, for any f in Cx, one may even require that either fexforfe> f. 
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The problem is then that of the approximation of the graph of a plane curve 
by inscribed or circumscribed polygons, as treated in elementary calculus. But 
let us remark that the lemma is also a particular case of the Stone— Weierstrass 
theorem (see, e.g., Rudin [2]) and should be so verified by the reader. Such 
a sledgehammer approach has its merit, as other kinds of approximation 
soon to be needed can also be subsumed under the same theorem. Indeed, 
the discussion in this section is meant in part to introduce some modern 
terminology to the relevant applications in probability theory. We can now 
state the following alternative criterion for vague convergence. 


Theorem 4.4.1. Let {j,} and yz be s.p.m.’s. Then uw, —> mw if and only if 
(2) Vf € Cxlor Col: ie Oud [ Swe: 
FP) Ri 


PROOF. Suppose [Ly — 1; (2) is true by definition when f is the indicator 
of (a, b] for a € D, b € D, where D is the set in (1) of Sec. 4.3. Hence by the 
linearity of integrals it is also true when f is any D-valued step function. Now 
let f € Co and € > 0; by the approximation lemma there exists a D-valued 


step function f, satisfying (1). We have 
@)|f rau. f raul < | [ir = fordun| + | fed - [ fedul 


+| fue-naw 


By the modulus inequality and mean value theorem for integrals (see Sec. 3.2), 
the first term on the right side above is bounded by 


it-feldun ce f dun <6 


similarly for the third term. The second term converges to zero as n — 00 
because f, is a D-valued step function. Hence the left side of (3) is bounded 
by 2€ as n — ov, and so converges to zero since € is arbitrary. 

Conversely, suppose (2) is true for f € Cx. Let A be the set of atoms 
of yz as in the proof of Theorem 4.3.2; we shall show that vague convergence 
holds with D= A‘. Let g = 1(4,) be the indicator of (a, b] where a € D, 
b € D. Then, given € > 0, there exists 6(€) > 0 such that a+ 6 < b— 6, and 
such that w(U) < € where 


U=(a-—d,a+6)U(b—6,b+4+5). 


Now define g; to be the function that coincides with g on (—oo, a] U[a+ 
6, b — 6] U[b, oo) and that is linear in (a,a +6) and in (b — 6, b); go to be 
the function that coincides with g on (—oo, a — 6] U [a, b] U[b + 6, 00) and 


a 
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that is linear in (a — 6, a) and (b, b + 4). It is clear that g3} < g<g.<g,4+1 
and consequently 


(4) Js d [Ln < f edu, < [ edu, 


+ L 


(5) feds fedus [eau 


Since g; € Cr, g2 € Cx, it follows from (2) that the extreme terms in (4) 
converge to the corresponding terms in (5). Since 


[eau- [eau f ldu = WU) <e, 
U 


and «€ is arbitrary, it follows that the middle term in (4) also converges to that 
in (5), proving the assertion. 


Corollary. If {i,} is a sequence of s.p.m.’s such that for every f € Cx, 


tim [fn (a) 
n Rh 
exists, then {z,} converges vaguely. 


For by Theorem 4.3.3 a subsequence converges vaguely, say to pw. By 
Theorem 4.4.1, the limit above is equal to So f (x)u (dx). This must then 
be the same for every vaguely convergent subsequence, according to the 
hypothesis of the corollary. The vague limit of every such sequence is 
therefore uniquely determined (why?) to be jz, and the corollary follows from 
Theorem 4.3.4. 


Theorem 4.4.2. Let {,} and w be p.m.’s. Then p,, —> yw if and only if 
6) vfece | fom an— | foam an. 


PROOF. Suppose [Ly 4+ pu. Given € > OQ, there exist a and b in D such that 


(7) w((a, b}°) = 1 — w((a, b}) < €. 


It follows from vague convergence that there exists mo(e€) such that if 
n > no(ée), then 


(8) Len ((a, b]°) = 1 — Un ((@, b]) < €. 


Let f € Cg be given and suppose that | f| <M < oo. Consider the function 
fc, which is equal to f on [a,b], to zero on (-o,a—1)U(b+ 1, 0), 
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and which is linear in [a—1,a) and in (6,b+1]. Then f. € Cr and 
|f — fel < 2M. We have by Theorem 4.4.1 


(9) / fedy > / fedp. 
RK R! 
On the other hand, we have 
(10) | If — felditn < | 2M dim <2Me 
Ri (a, bye 


by (8). A similar estimate holds with w replacing 4, above, by (7). Now the 
argument leading from (3) to (2) finishes the proof of (6) in the same way. 
This proves that i, Su implies (6); the converse has already been proved 
in Theorem 4.4.1. 


Theorems 4.3.3 and 4.3.4 deal with s.p.m.’s. Even if the given sequence 
{in} consists only of strict p.m.’s, the sequential vague limit may not be so. 
This is the sense of Example 2 in Sec. 4.3. It is sometimes demanded that 
such a limit be a p.m. The following criterion is not deep, but applicable. 


Theorem 4.4.3. Let a family of p.m.’s {t1g, a € A} be given on an arbitrary 
index set A. In order that every sequence of them contains a subsequence 
which converges vaguely to a p.m., it is necessary and sufficient that the 
following condition be satisfied: for any € > 0, there exists a finite interval J 
such that 
(11) inf U7) > l—-e. 

aeA 


PROOF. Suppose (11) holds. For any sequence {y,} from the family, there 
exists a subsequence {j,,} such that ju} + i. We show that yz is a p.m. Let J 
be a continuity interval of 4 which contains the J in (11). Then 


wR") > uJ) = lim py, VJ) = lim}, (7) > 1 —. 
Hn n 
Since € is arbitrary, u(#!) = 1. Conversely, suppose the condition involving 


(11) is not satisfied, then there exists € > 0, a sequence of finite intervals /,, 
increasing to .#', and a sequence {,} from the family such that 


Wn: [en (Un) < l—e. 


Let {j),} and yz be as before and / any continuity interval of w. Then J C /, 
for all sufficiently large n, so that 


i, lim, 
w(J) = lim w,J) < —w, (In) S 1 -€. 
n n 


Thus w(%!) < 1 —€ and y is not a p.m. The theorem is proved. 
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A family of p.m.’s satisfying the condition above involving (11) is said to 
be tight. The preceding theorem can be stated as follows: a family of p.m.’s is 
relatively compact if and only if it is tight. The word “relatively” purports that 
the limit need not belong to the family; the word “compact” is an abbreviation 
of “sequentially vaguely convergent to a strict p.m.” Extension of the result 
to p.m.’s in more general topological] spaces is straight-forward but plays an 
important role in the convergence of stochastic processes. 

The new definition of vague convergence in Theorem 4.4.1 has the 
advantage over the older ones in that it can be at once carried over to measures 
in more general topological spaces. There is no substitute for “intervals” in 
such a space but the classes Cx, Co and Cz are readily available. We will 
illustrate the general approach by indicating one more result in this direction. 

Recall the notion of a lower semicontinuous function on &! defined by: 


(12) Vx € R': f(x) < lim f(y). 


yx 


There are several equivalent definitions (see, e.g., Royden [5]) but 
the following characterization is most useful: f is bounded and lower 
semicontinuous if and only if there exists a sequence of functions f, € Cp 
which increases to f everywhere, and we call f upper semicontinuous iff —f 
is lower semicontinuous. Usually f is allowed to be extended-valued; but to 
avoid complications we will deal with bounded functions only and denote 
by L and U respectively the classes of bounded lower semicontinuous and 
bounded upper semicontinuous functions. 


Theorem 4.4.4. If {u,} and w are p.m.’s, then wp, > w if and only if one 
of the two conditions below is satisfied: 


(13) Vf €L: lim / F (x) pun (dx) > / F(x) (dx) 


vee Us iim / 2(x)tn (dx) < / a(x)u (dx). 


PROOF. We begin by observing that the two conditions above are equiv- 


alent by putting f = —g. Now suppose p,, —> yx and let fr Cp, fret f. 
Then we have 


(14) tim [Fp (dx) > tim f feCOun dxy =f faldu ds 


by Theorem 4.4.2. Letting k — oo the last integral above converges to 
{ f()u(dx) by monotone convergence. This gives the first inequality in (13). 
Conversely, suppose the latter is satisfied and let g € Cz, then ¢g belongs to 
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both L and U, so that 
[vcmcas) stim f o0un (dx) <i | pun dx) = f oon (ax) 


which proves 


lim / o(x)bn (dx) = / v(x) (dx). 


Hence pt, > WL by Theorem 4.4.2. 


Remark. (13) remains true if, e.g., f is lower semicontinuous, with +00 
as a possible value but bounded below. 


Corollary. The conditions in (13) may be replaced by the following: 


for every open O:lim 4,(O) > u(O); 
n 
for every closed C: lim pw,(C) < uC). 


We leave this as an exercise. 
Finally, we return to the connection between the convergence of r.v.’s 
and that of their distributions. 


DEFINITION OF CONVERGENCE “IN DISTRIBUTION” (in dist.). A sequence of 
r.v.’s {X,} is said to converge in distribution to F iff the sequence {F,} 
of corresponding d.f.’s converges vaguely to the df. F. 


If X is an r.v. that has the df. F', then by an abuse of language we shall 
also say that {X,,} converges in dist. to X. 


Theorem 4.4.5. Let {F,,}, F be the d.f.’s of the r.v.’s {X,}, X. If X, — X in 
pr., then F,, —> F. More briefly stated, convergence in pr. implies convergence 
in dist. 


proor. If X, — X in pr., then for each f € Cx, we have f(X,) > f(X) 
in pr. as easily seen from the uniform continuity of f (actually this is true 
for any continuous f, see Exercise 10 of Sec. 4.1). Since f is bounded the 
convergence holds also in L'! by Theorem 4.1.4. It follows that 


Af (Xn)} > EF (X)}, 


which is just the relation (2) in another guise, hence pL, + LL. 


Convergence of r.v.’s in dist. is merely a convenient turn of speech; it 
does not have the usual properties associated with convergence. For instance, 
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if X, — X in dist. and Y, — Y in dist., it does not follow by any means 
that X, + Y, will converge in dist. to X + Y. This is in contrast to the true 
convergence concepts discussed before; cf. Exercises 3 and 6 of Sec. 4.1. But 
if X, and Y, are independent, then the preceding assertion is indeed true as a 
property of the convergence of convolutions of distributions (see Chapter 6). 
However, in the simple situation of the next theorem no independence assump- 
tion is needed. The result is useful in dealing with limit distributions in the 
presence of nuisance terms. 


Theorem 4.4.6. If X, — X in dist, and Y, — 0 in dist., then 
(a) X, + Y, — X in dist. 
(b) X,Y, — O in dist. 


PROOF. We begin with the remark that for any constant c, Y,, — c in dist. 
is equivalent to Y, — c in pr. (Exercise 4 below). To prove (a), let f € Cx, 
|f| <M. Since f is uniformly continuous, given € > 0 there exists 6 such 
that |x — y| < 6 implies | f(x) — f(y)| < €. Hence we have 


E{|f Xn + Yn) — f Xn)I} 
<€P{|f (Xn + Vn) — fXn)| Se} | +2MAUF (Kn + Yn) -— FX) > 
<€+2MP{|Y,,| > 4}. 
The last-written probability tends to zero as n — oo; it follows that 
Jim Af Xn + ¥n)} = lim, Ff Xn)} = AF OO} 
by Theorem 4.4.1, and (a) follows by the same theorem. 


To prove (b), for given € > 0 we choose Ap so that both +Ag are points 
of continuity of the d.f. of X, and so large that 


lim P{|X,| > Ao} = A{|X| > Ao} < e. 
n>OO 


This means 7{|X,,| > Ao} < € for n > no(€). Furthermore we choose A > Ao 
so that the same inequality holds also for n < no(€). Now it is clear that 


€ € 
PXn¥nl >} SAKal > AVA {al > S} SE+PA Lal > Sf. 
The last-written probability tends to zero as n —> oo, and (b) follows. 


Corollary. If X, > X, a, — a, By, — b, all in dist. where a and b are 
constants, then a,X, + 8, > aX + b in dist. 
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EXERCISES 


*1. Let yz, and w be p.m.’s such that py, ee p. Show that the conclusion 
in (2) need not hold if (a) f is bounded and Borel measurable and all py, 
and yz are absolutely continuous, or (b) f is continuous except at one point 
and every “4, is absolutely continuous. (To find even sharper counterexamples 
would not be too easy, in view of Exercise 10 of Sec. 4.5.) 


2. Let “4, — when the [fn’S are S.p.m.’s. Then for each f € C and 
each finite continuity interval J we have fi fdun— f, f du. 
*3. Let w, and yu be as in Exercise 1. If the f,,’s are bounded continuous 
functions converging uniformly to f, then f frdun > f f du. 


*4, Give an example to show that convergence in dist. does not imply that 
in pr. However, show that convergence to the unit mass 6, does imply that in 
pr. to the constant a. 


5. A set {a} of p.m.’s is tight if and only if the corresponding d.f.’s 
{F.} converge uniformly in a as x — —oo and as x > +00. 


6. Let the r.v.’s {X,} have the p.m.’s {u,}. If for some real r > 0, 
&€{|Xq|"} is bounded in a, then {,} is tight. 
7. Prove the Corollary to Theorem 4.4.4. 
8. If the r.v.’s X and ¥ satisfy 
ZX = Vp e) Se 
for some ¢, then their d.f.’s F and G satisfying the inequalities: 


(15) Wx Ee Rs F(x-€)—-€ < GX) < Fxtote. 


Derive another proof of Theorem 4.4.5 from this. 


*9. The Lévy distance of two s.d.f.’s F and G is defined to be the infimum 
of all e > 0 satisfying the inequalities in (15). Prove that this is indeed a metric 
in the space of s.d.f.’s, and that F’, converges to F in this metric if and only 


if F, > F and [®. dF, > [&. dF. 
10. Find two sequences of p.m.’s {,} and {v,} such that 


vieCe: | fdun— f fav, 0; 
but for no finite (a, b) is it true that 
bn(a, b) — vy (a, b) > 0. 


[HINT: Let 44, = 6,,, Vy» = 5s, and choose {7,}, {s,} suitably.] 
11. Let {u,} be a sequence of p.m.’s such that for each f € Cg, the 
sequence f,,, f du, converges; then ty, > 4, where 2 is a p.m. [Hint: If the 
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hypothesis is strengthened to include every f in C, and convergence of real 
numbers is interpreted as usual as convergence to a finite limit, the result is 
easy by taking an f going to oo. In general one may proceed by contradiction 
using an f that oscillates at infinity.] 


*12. Let F, and F be d-f.’s such that F,, > F. Define G,(6) and G(@) 
as in Exercise 4 of Sec. 3.1. Then G,,(@) > G(@) in a.e. [HinT: Do this first 
when F,, and F are continuous and strictly increasing. The general case is 
obtained by smoothing F,, and F by convoluting with a uniform distribution 
in [—d, +6] and letting 5 | 0; see the end of Sec. 6.1 below.] 


4.5 Uniform integrability; convergence of moments 


The function |x|”, r > 0, is in C but not in Cp, hence Theorem 4.4.2 does 
not apply to it. Indeed, we have seen in Example 2 of Sec. 4.1 that even 
convergence a.e. does not imply convergence of any moment of order r > 0. 
For, given r, a slight modification of that example will yield X, — X ae., 
(|X, |") = 1 but &(|X|") = 0. 

It is useful to have conditions to ensure the convergence of moments 
when X, converges a.e. We begin with a standard theorem in this direction 
from classical analysis. 


Theorem 4.5.1. If X, — X ae., then for every r > 0: 
(1) EX") < lim &(\Xy|"). 


n> OO 
If X, > X in L’, and X € L’, then &(|X, |") > &(XI’). 


PROOF. (1) is just a case of Fatou’s lemma (see Sec. 3.2): 


I Kran = [tim Xu!" a? < lim | Kil" a7, 
Q Q n n Q 


where +00 is allowed as a value for each member with the usual convention. 
In case of convergence in L”, r> 1, we have by Minkowski’s inequality 
(Sec. 3.2), since X = X, + (X —X,) =X, — (X, —X).: 


EUXn I” = (Xn — XY SEIKI = EWK nN + EK — XID. 


Letting n — oo we obtain the second assertion of the theorem. For 0 < r < 1, 
the inequality |x + y|’ < |x|" + |y|" implies that 


E(Xal) — UX — Xyl") < EX") < EK) + EX — Xn"), 
whence the same conclusion. 


The next result should be compared with Theorem 4.1.4. 
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Theorem 4.5.2. If {X,} converges in dist. to X, and for some p> 0, 
sup, “{|X,|?} = M < oo, then for each r < p: 


(2) lim &(|Xn|") = €(|X|") < oo. 

n7@® 
If r is a positive integer, then we may replace |X,,|" and |X|’ above by X,,’ 
and X’. 


PROOF. We prove the second assertion since the first is similar. Let F,,, F 
be the d.f.’s of X,, X; then F,, ++ F. For A > 0 define fa on &' as follows: 


x; if |x| <A; 
(3) fa) = 14" if x > A; 
(—A)’, if x < —A. 


Then f4 € Cg, hence by Theorem 4.4.4 the “truncated moments” converge: 


[ facrar,c > [ fa(x) dF (x). 


Next we have 


/ fat) —x'ldF ye) < | 


lx 


] M 
< fl X,/PdP < ——. 
AP- Jo Apr 


The last term does not depend on n, and converges to zero as A > oo. It 
follows that as A > 00, i eee fadF,, converges uniformly in n to tae x’ dF. 
Hence by a standard theorem on the inversion of repeated limits, we have 


x oO CO 
(4) / x’ dF = lim / fadF = lim lim / fadF, 
= A> oo 00 A>0OO N00 as 


x 


Ritarma= / IX, dP 
|>A IXnl>A 


[o,¢) CO 
= lim im, | fadF, = im, | x’ dF,,. 
n>Ow A> 00 —00 NFO J_oo 
We now introduce the concept of uniform integrability, which is of basic 
importance in this connection. It is also an essential hypothesis in certain 
convergence questions arising in the theory of martingales (to be treated in 
Chapter 9). 


DEFINITION OF UNIFORM INTEGRABILITY. A family of r.v.’s {X,}, te T, 
where T is an arbitrary index set, is said to be uniformly integrable iff 


(5) lim IX,|dP? =0 


A>oo IX,|>A 


uniformly in t € T. 
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Theorem 4.5.3. The family {X,} is uniformly integrable if and only if the 
following two conditions are satisfied: 


(a) &(|X,|) is bounded in t € T; 
(b) For every € > 0, there exists 5(€) > 0 such that for any E € #: 


P(E) < 5(€-) > i) \X,|dP < € for every t € T. 
E 


PROOF. Clearly (5) implies (a). Next, let E € Y and write E, for the set 
{w : |X;(w)| > A}. We have by the mean value theorem 


[ian = (| + ) ars | IX,|dP + AP(E). 
E ENE, EXE, E; 


Given € > 0, there exists A = A(e€) such that the last-written integral is less 
than ¢€/2 for every t, by (5). Hence (b) will follow if we set 5 = €/2A. Thus 
(5) implies (b). 

Conversely, suppose that (a) and (b) are true. Then by the Chebyshev 
inequality we have for every f, 


A\|X;| > A} < 


where M is the bound indicated in (a). Hence if A > M/6, then P(E,) < 6 
and we have by (b): 


IX, |dP <e. 


E; 


Thus (5) is true. 


Theorem 4.5.4. Let 0<r< oo, X, €L', and X, — X in pr. Then the 
following three propositions are equivalent: 


(i) {{X,|"} is uniformly integrable; 
(ii) X, > X in L’; 
(iii) @(Xn|") > E(X|") < oo. 


PROOF. Suppose (i) is true; since X, — X in pr. by Theorem 4.2.3, there 
exists a subsequence {n;} such that X,, — X a.e. By Theorem 4.5.1 and (a) 
above, X € L’. The easy inequality 


[Xn — X|" < 2°{|Xn |" + (X17), 


valid for all r > 0, together with Theorem 4.5.3 then implies that the sequence 
{|X, —X|"} is also uniformly integrable. For each « > 0, we have 
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(6) ‘i IX, -xrdr= | Xn -xrdr+ | IX, —X|'d7? 
Q |X, -—X|>e€ 


|X, —X|Se 


< | IX, —X|"dP+e€. 
|X, —-X|>€ 


Since 7{|X, — X| > €} > 0 as n > o by hypothesis, it follows from (b) 
above that the last written integral in (6) also tends to zero. This being true 
for every € > 0, (ai) follows. 

Suppose (ii) is true, then we have (iii) by the second assertion of 
Theorem 4.5.1. 

Finally suppose (iii) is true. To prove (i), let A > O and consider a function 
fa in Cx satisfying the following conditions: 


= |x|" — for |x|" < A; 
F005 [x|" for A < |x|" <A+1; 
= 0 for |x|" > A+]; 


cf. the proof of Theorem 4.4.1 for the construction of such a function. Hence 
we have 


lim Kel a? > tim faa = fore fxr ar. 


n-> OO J |X, |" SA+1 [X|"<A 
where the inequalities follow from the shape of f4, while the limit relation 


in the middle as in the proof of Theorem 4.4.5. Subtracting from the limit 
relation in (ili), we obtain 


lim |X, |" dP < | |X|’ dP. 
n-70O JX, | >At1 IX|/">A 


The last integral does not depend on n and converges to zero as A — oo. This 
means: for any € > 0, there exists Ag = Ag(€) and no = no(Ao(e)) such that 


we have 
sup / IX, |;dP <e 
n>ng J |X, |">A+] 


provided that A > Ao. Since each |X,,|" is integrable, there exists Ay = Aj(€) 
such that the supremum above may be taken over all n > 1 provided that 
A > Ag V Aj. This establishes (i), and completes the proof of the theorem. 


In the remainder of this section the term “moment” will be restricted to a 
moment of positive integral order. It is well known (see Exercise 5 of Sec. 6.6) 
that on (7/, #) any p.m. or equivalently its df. is uniquely determined by 
its moments of all orders. Precisely, if F; and F are two d.f.’s such that 
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F,(0) = 0, F;(1) = 1 for i = 1, 2; and 


1 1 
Yn > iy x" dF y(x) =) x” dF (x), 
0 0 


then F, = F2. The corresponding result is false in (#!,#') and a further 
condition on the moments is required to ensure uniqueness. The sufficient 
condition due to Carleman is as follows: 

lo @) 


~ 1 
De ar = +00, 
r=] 2, 


where m, denotes the moment of order r. When a given sequence of numbers 
{m,,r > 1} uniquely determines a d.f. F such that 


x 

(7) My = / x’ dF (x), 

—0o 
we say that “the moment problem is determinate” for the sequence. Of course 
an arbitrary sequence of numbers need not be the sequence of moments for 
any d.f.; a necessary but far from sufficient condition, for example, is that 
the Liapounov inequality (Sec. 3.2) be satisfied. We shall not go into these 
questions here but shall content ourselves with the useful result below, which 
is often referred to as the “method of moments”; see also Theorem 6.4.5. 


Theorem 4.5.5. Suppose there is a unique d.f. F with the moments {m™, r > 
1}, all finite. Suppose that {F,,} is a sequence of d.f.’s, each of which has all 


its moments finite: a 
m” =H aE). 
—00 
Finally, suppose that for every r > 1: 


(8) lim m® =m”, 
nao 


Then F, — F. 


PROOF. Let jz, be the p.m. corresponding to F,,. By Theorem 4.3.3 there 
exists a subsequence of {,,} that converges vaguely. Let {,,} be any sub- 
sequence converging vaguely to some jy. We shall show that yu is indeed a 
p.m. with the df. F. By the Chebyshev inequality, we have for each A > 0: 


Hn (—A, +A) > 1—A~?m,?. 


Since ae — m®) < oo, it follows that as A > 00, the left side converges 
uniformly in k to one. Letting A — oo along a sequence of points such that 
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both +A belong to the dense set D involved in the definition of vague conver- 
gence, we obtain as in (4) above: 


uA!) = lim w(—A, +A) = lim lim pp, (—A, +A) 
A>oo A>00 k>00 
= lim lim py,(—A, +A) = lim pn, (Z') = 1. 
k—> 00 A->00 k>00 
Now for each r, let p be the next larger even integer. We have 


le. @) 
/ xP dun, = m\?? > m'?), 


—0O 


hence mi?) is bounded in k. It follows from Theorem 4.5.2 that 


[o.@) oO 
/ x" din, > / x" du. 
—0o —0o 


But the left side also converges to m by (8). Hence by the uniqueness 
hypothesis yz is the p.m. determined by F’. We have therefore proved that every 
vaguely convergent subsequence of {,,}, or equivalently {F,,}, has the same 
limit 4, or equivalently F. Hence the theorem follows from Theorem 4.3.4. 


EXERCISES 


1. If sup, |X,| € L? and X, — X ae., then X € L? and X, > X in L?. 

2. If {X,,} is dominated by some Y in L?, and converges in dist. to X, 
then &(|X,|?) > €(|X|?). 

3. If X, — X in dist., and f € C, then f(X,) > f(X) in dist. 

*4, Exercise 3 may be reduced to Exercise 10 of Sec. 4.1 as follows. Let 
F,,1<n < oo, be d.f.’s such that F,, +. F. Let 6 be uniformly distributed on 
[0, 1] and put X, = F7'(6), 1 <n < 00, where F7!(y) = sup{x: F,(x) < y} 
(cf. Exercise 4 of Sec. 3.1). Then X, has d.f. F, and X, — Xoo in pr. 

5. Find the moments of the normal d.f. ® and the positive normal df. 
@®, below: 


(x) = 


2 2 
1 oe 2 = fF ey? i : 
se [ean eyeya) Vee Bee 0 
TJ -o 0, 


if x < 0. 


Show that in either case the moments satisfy Carleman’s condition. 

6. If {X,} and {Y,} are uniformly integrable, then so is {X,+ Y,} and 
{X, + Y;}. 

7. If {X,} is dominated by some Y in L! or if it is identically distributed 
with finite mean, then it is uniformly integrable. 
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*8. If sup, (|X|?) < co for some p > 1, then {X,,} is uniformly inte- 
grable. 
9. If {X,,} is uniformly integrable, then the sequence 


1 n 
-5 Xjn>1 
ne 

j=! 


is uniformly integrable. 

*10. Suppose the distributions of {X,, 1 <n < oo} are absolutely contin- 
uous with densities {g,} such that g, — go in Lebesgue measure. Then 
2n > &oo in L!(—oo, 00), and consequently for every bounded Borel measur- 
able function f we have &{f(Xn)} > &{f (Xoo)}. [HINT: f (800 — 8n)t dx = 
f (200 — 8n) dx and (goo — 2n)* < 200; use dominated convergence.] 


5 Law of large numbers. 
Random series 


5.1 Simple limit theorems 


The various concepts of Chapter 4 will be applied to the so-called “law of 
large numbers” —a famous name in the theory of probability. This has to do 
with partial sums 


of a sequence of r.v.’s. In the most classical formulation, the “weak” or the 
“strong” law of large numbers is said to hold for the sequence according as 


Sn —é Sn 
(1) Sn = On) 9 
rn 


in pr. or a.e. This, of course, presupposes the finiteness of ¢(S,,). A natural 
generalization is as follows: 
Sn — an 
by, 


where {a,} is a sequence of real numbers and {b,} a sequence of positive 
numbers tending to infinity. We shall present several stages of the development, 


— 0, 
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even though they overlap each other to some extent, for it is just as important 
to learn the basic techniques as the results themselves. 

The simplest cases follow from Theorems 4.1.4 and 4.2.3, according to 
which if Z, is any sequence of r.v.’s, then &(Z2) — 0 implies that Z, — 0 
in pr. and Z,, — 0 a.e. for a subsequence {n;}. Applied to Z, = S,,/n, the 
first assertion becomes 


Si . 
(2) é(S?2) = o(n?) > — > 0 in pr. 
n 


Now we can calculate “(S?) more explicitly as follows: 
2 


6 SX, (xe So XjXe 
j=l 


l<j<k<n 


3) = &(St)= 


| 
5 


Sey +2 S> &(XjXz). 


1<j<k<n 


Observe that there are n* terms above, so that even if all of them are bounded 
by a fixed constant, only &(S?) = O(n?) will result, which falls critically short 
of the hypothesis in (2). The idea then is to introduce certain assumptions to 
cause enough cancellation among the “mixed terms” in (3). A salient feature 
of probability theory and its applications is that such assumptions are not only 
permissible but realistic. We begin with the simplest of its kind. 


DEFINITION. Twor.v.’s X and Y are said to be uncorrelated iff both have 
finite second moments and 


(4) E(XY) = &(X)E(Y). 
They are said to be orthogonal iff (4) is replaced by 
(5) E(XY) = 


The r.v.’s of any family are said to be uncorrelated [orthogonal] iff every two 
of them are. 


Note that (4) is equivalent to 
E{(X — E(X))Y — E(¥))} = 0, 


which reduces to (5) when ¢(X)= ¢(¥)=0. The requirement of finite 
second moments seems unnecessary, but it does ensure the finiteness of 
&(XY) (Cauchy—Schwarz inequality!) as well as that of ¢(X) and ¢(Y), and 
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without it the definitions are hardly useful. Finally, it is obvious that pairwise 
independence implies uncorrelatedness, provided second moments are finite. 

If {X,,} is a sequence of uncorrelated r.v.’s, then the sequence {X,, — 
&(X,,)} is orthogonal, and for the latter (3) reduces to the fundamental relation 
below: 


(6) (Sn) = 07 (X;), 
j=l 


which may be called the “additivity of the variance”. Conversely, the validity 
of (6) for n = 2 implies that X; and X», are uncorrelated. There are only n 
terms on the right side of (6), hence if these are bounded by a fixed constant 
we have now o7(S,,) = O(n) = o(n”). Thus (2) becomes applicable, and we 
have proved the following result. 


Theorem 5.1.1. If the X;’s are uncorrelated and their second moments have 
a common bound, then (1) is true in L* and hence also in pr. 


This simple theorem is actually due to Chebyshev, who invented his 
famous inequalities for its proof. The next result, due to Rajchman (19372), 
strengthens the conclusion by proving convergence a.e. This result is inter- 
esting by virtue of its simplicity, and serves well to introduce an important 
method, that of taking subsequences. 


Theorem 5.1.2. Under the same hypotheses as in Theorem 5.1.1, (1) holds 
also a.e. 


PROOF. Without loss of generality we may suppose that ¢(X;) = 0 for 
each j, so that the X’s are orthogonal. We have by (6): 
&(S2) < Mn, 


where M is a bound for the second moments. It follows by Chebyshev’s 
inequality that for each € > 0 we have 


P'NS,| > ne} < eae = ae 
n-€ née 
If we sum this over n, the resulting series on the right diverges. However, if 
we confine ourselves to the subsequence {n7}, then 


Ss PllSn2l >nej= S> 23 < oO 


n 


Hence by Theorem 4.2.1 (Borel—Cantelli) we have 
(7) P\S,2| > ne i.o.} = 0; 
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and consequently by Theorem 4.2.2 


(8) 


We have thus proved the desired result for a subsequence; and the “method 

of subsequences” aims in general at extending to the whole sequence a result 

proved (relatively easily) for a subsequence. In the present case we must show 

that S; does not differ enough from the nearest S,,2 to make any real difference. 
Put for each n > 1: 


D, = max. .|S_;—.S,3]. 
n2<k<(n+1)* 


Then we have 
(n+1 a 


E(DF) < 2nE(S(n41y — Sarl?) = 2n S07 (Xj) < 4n?M 


j=n?+l 


and consequently by Chebyshev’s inequality 


4M 
PID» > ne} < 7° 
ern 


It follows as before that 


D 


(9) — >0 ae. 
n 


Now it is clear that (8) and (9) together imply (1), since 
1Skl ae |Sn2| + Dp 


ko n2 
for n* <k < (n+ 1)*. The theorem is proved. 


The hypotheses of Theorems 5.1.1 and 5.1.2 are certainly satisfied for a 
sequence of independent r.v.’s that are uniformly bounded or that are identi- 
cally distributed with a finite second moment. The most celebrated, as well as 
the very first case of the strong law of large numbers, due to Borel (1909), is 
formulated in terms of the so-called “normal numbers.” Let each real number 
in [0, 1] be expanded in the usual decimal system: 


(10) W == -XXQ...AX_peee - 


Except for the countable set of terminating decimals, for which there are two 
distinct expansions, this representation is unique. Fix a k:0 <k <9, and let 
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vt) denote the number of digits among the first n digits of w that are 
equal to k. Then v;”’(w)/n is the relative frequency of the digit k in the first 
n places, and the limit, if existing: 


(11) lim 


may be called the frequency of k in w. The number is called simply normal 
(to the scale 10) iff this limit exists for each k and is equal to 1/10. Intuitively 
all ten possibilities should be equally likely for each digit of a number picked 
“at random’. On the other hand, one can write down “at random” any number 
of numbers that are “abnormal” according to the definition given, such as 
-1111..., while it is a relatively difficult matter to name even one normal 
number in the sense of Exercise 5 below. It turns out that the number 


-12345678910111213..., 


which is obtained by writing down in succession all the natural numbers in 
the decimal system, is a normal number to the scale 10 even in the strin- 
gent definition of Exercise 5 below, but the proof is not so easy. As for 
determining whether certain well-known numbers such as e — 2 or m7 — 3 are 
normal, the problem seems beyond the reach of our present capability for 
mathematics. In spite of these difficulties, Borel’s theorem below asserts that 
in a perfectly precise sense almost every number is normal. Furthermore, this 
striking proposition is merely a very particular case of Theorem 5.1.2 above. 


Theorem 5.1.3. Except for a Borel set of measure zero, every number in 
[0, 1] is simply normal. 


PROOF. Consider the probability space (7/,4,m) in Example 2 of 
Sec. 2.2. Let Z be the subset of the form m/10” for integers n > 1, m > 1, 
then m(Z) = 0. If m € #\Z, then it has a unique decimal expansion; if w € Z, 
it has two such expansions, but we agree to use the “terminating” one for the 
sake of definiteness. Thus we have 


GEE seca ds 


where for each n > 1, &,(-) is a Borel measurable function of w. Just as in 
Example 4 of Sec. 3.3, the sequence {&,, n > 1} is a sequence of independent 


r.v.’s with 

1 = 
aq: =H Oe: 

Indeed according to Theorem 5.1.2 we need only verify that the &,’s are 
uncorrelated, which is a very simple matter. For a fixed k we define the 
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r.v. X, to be the indicator of the set {w:&,(@) =k}, then &(X,) = 1/10, 
&(X,*) = 1/10, and 
1 n 
~ 2 Xi@) 
ne 
j=l 


is the relative frequency of the digit k in the first n places of the decimal for 


w. According to Theorem 5.1.2, we have then 
Sn 1 
ee —_> — 

n 10 


Hence in the notation of (11), we have A{g, = 1/10} = 1 for each k and 


consequently also 
9 
1 
a S| dy 
{Ole sl} 


which means that the set of normal numbers has Borel measure one. 
Theorem 5.1.3 is proved. 


a.c. 


The preceding theorem makes a deep impression (at least on the older 
generation!) because it interprets a general proposition in probability theory 
at a most classical and fundamental level. If we use the intuitive language of 
probability such as coin-tossing, the result sounds almost trite. For it merely 
says that if an unbiased coin is tossed indefinitely, the limiting frequency of 
“heads” will be equal to + — that is, its a priori probability. A mathematician 
who is unacquainted with and therefore skeptical of probability theory tends 
to regard the last statement as either “obvious” or “unprovable’, but he can 
scarcely question the authenticity of Borel’s theorem about ordinary decimals. 
As a matter of fact, the proof given above, essentially Borel’s own, is a 
lot easier than a straightforward measure-theoretic version, deprived of the 
intuitive content [see, e.g., Hardy and Wright, An introduction to the theory of 
numbers, 3rd. ed., Oxford University Press, Inc., New York, 1954]. 


EXERCISES 


1. For any sequence of r.v.’s {X,,}, if & (X2) — 0, then (1) is true in pr. 
but not necessarily a.e. 
*2. Theorem 5.1.2 may be sharpened as follows: under the same hypo- 
theses we have S,,/n* — 0 a.e. for any a > 3 
3. Theorem 5.1.2 remains true if the hypothesis of bounded second mo- 
ments is weakened to: 07(X,) = O(n’) where 0 < @ < 7 Various combina- 
tions of Exercises 2 and 3 are possible. 
*4. If {X,} are independent r.v.’s such that the fourth moments & (x4) 
have a common bound, then (1) is true a.e. [This is Cantelli’s strong law of 
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large numbers. Without using Theorem 5.1.2 we may operate with ¢(S¢ /n‘*) 
as we did with “(S?/n*). Note that the full strength of independence is not 
needed. ] 

5. We may strengthen the definition of a normal number by considering 
blocks of digits. Let r > 1, and consider the successive overlapping blocks of 
r consecutive digits in a decimal; there are nm — r + ] such blocks in the first 
n places. Let v\")(w) denote the number of such blocks that are identical with 
a given one; for example, if r = 5, the given block may be “21212”. Prove 
that for a.e. w, we have for every r: 


v@) 1 
m = 


n—>0Oo n - jor 


[HINT: Reduce the problem to disjoint blocks, which are independent. ] 


*6. The above definition may be further strengthened if we consider diffe- 
rent scales of expansion. A real number in [0, 1] is said to be completely 
normal iff the relative frequency of each block of length r in the scale s tends 
to the limit 1/s” for every s and r. Prove that almost every number in [0, 1] 
is completely normal. 

7. Let a be completely normal. Show that by looking at the expansion 
of a in some scale we can rediscover the complete works of Shakespeare 
from end to end without a single misprint or interruption. [This is Borel’s 
paradox.] 

*8. Let X be an arbitrary r.v. with an absolutely continuous distribution. 
Prove that with probability one the fractional part of X is a normal number. 
[HINT: Let N be the set of normal numbers and consider P{X — [X] € N}.] 


9. Prove that the set of real numbers in [0, 1] whose decimal expansions 
do not contain the digit 2 is of measure zero. Deduce from this the existence 
of two sets A and B both of measure zero such that every real number is 
representable as asuma+b withaeA,be B. 

*10. Is the sum of two normal numbers, modulo 1, normal? Is the product? 
{HINT: Consider the differences between a fixed abnormal number and all 
normal numbers: this is a set of probability one.] 


5.2 Weak law of large numbers 


The law of large numbers in the form (1) of Sec. 5.1 involves only the first 
moment, but so far we have operated with the second. In order to drop any 
assumption on the second moment, we need a new device, that of “equivalent 
sequences’, due to Khintchine (1894-1959). 


5.2 WEAK LAW OF LARGENUMBERS | 113 


DEFINITION. Two sequences of r.v.’s {X,,} and {Y,} are said to be equiv- 
alent iff 


(1) SS" PAiXn # Yn} < 00. 


In practice, an equivalent sequence is obtained by “truncating” in various 
ways, as we shall see presently. 


Theorem 5.2.1. If {X,} and {Y,} are equivalent, then 


Son —Y,) converges ae. 
Furthermore if a, t oo, then 


1 n 
2 — pee a O ae. 
(2) mp a j70 ae 


PROOF. By the Borel—Cantelli lemma, (1) implies that 


PAX, £Yy io} = 0. 


This means that there exists a null set N with the following property: if 
w € Q\N, then there exists ng(w) such that 


n = now) > Xn(@) = Y,(@). 


Thus for such an w, the two numerical sequences {X,,(w)} and {Y,,(w)} differ 
only in a finite number of terms (how many depending on w). In other words, 
the series 


S>(&n(@) = ¥n(@)) 


consists of zeros from a certain point on. Both assertions of the theorem are 
trivial consequences of this fact. 


Corollary. With probability one, the expression 


ec or SG 
n j=l 


n + 


converges, diverges to +00 or —oo, or fluctuates in the same way as 


a or De Ye 
n Gn j=l 
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respectively. In particular, if 


To prove the last assertion of the corollary, observe that by Theorem 4.1.2 
the relation (2) holds also in pr. Hence if 


] n 
—Soxj>X in pr., 
On a 


then we have 
Se yy as ee0ee | 
a, 4 ar ae iG, 2 j j = In pr. 
j=l j=! j=l 
(see Exercise 3 of Sec. 4.1). 


The next law of large numbers is due to Khintchine. Under the stronger 
hypothesis of total independence, it will be proved again by an entirely 
different method in Chapter 6. 


Theorem 5.2.2. Let {X,,} be pairwise independent and identically distributed 
r.v.’s with finite mean m. Then we have 


Sn : 
(3) —-—>m in pr. 
n 
PROOF. Let the common d.f. be F' so that 


m= EK) = | xdF(x), ex = | Ix] dF(x) < 00. 


By Theorem 3.2.1 the finiteness of &(|X |) is equivalent to 


S° AUX | >n)< oO. 
n 


Hence we have, since the X,,’s have the same distribution: 


(4) S > P(\Xn| > 1) < 00. 
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We introduce a sequence of r.v.’s {Y,,} by “truncating at n”: 


_JXn@), if |Xn(@)| <n; 
Yo) = {9 if |X,(w)| > n. 


This is equivalent to {X,,} by (4), since P(|X,| > n) = A(X, #Y,,). Let 
TS Se ¥y, 
j=l 


By the corollary above, (3) will follow if (and only if) we can prove T,,/n > 
m in pr. Now the Y,,’s are also pairwise independent by Theorem 3.3.1 
(applied to each pair), hence they are uncorrelated, since each, being bounded, 
has a finite second moment. Let us calculate o”(T,); we have by (6) of 
Sec. 5.1, 


Tn = Yo ys >: 607) = ee x dF). 
j=l j=l j=l ix|<J 


The crudest estimate of the last term yields 


ay vara si [ 
j=l Ix|<J j=l | j 


x|<j 


Ix|dF(x) < ate i xl dF (x), 


which is O(n*), but not o(n?) as required by (2) of Sec. 5.1. To improve 
on it, let {a,} be a sequence of integers such that 0 < a, <n, a, — oo but 
a, = o(n). We have 


n 


ea vdF(Qx)=S > + 
ksi 


j=l JSGn an<jsn 


sa f  ar@+ Daf aro 


Jan |x|<an an <j<n \x|<ap 


+ nf ware 
ay <|x|<n 


an<jen 
fo 4) 
<nay [| ld F(x) +n? f |x| dF (x). 
—0O \x|>a, 


The first term is O(na,) = o(n”); and the second is n20(1) = o(n?), since 
the set {x: |x| > a,} decreases to the empty set and so the last-written inte- 
gral above converges to zero. We have thus proved that o?(T,) = o(n”) and 


116 | LAW OF LARGE NUMBERS. RANDOM SERIES 


consequently, by (2) of Sec. 5.1, 


n n* 


Th —é T), 1 Z ‘ . 
pollens Ce Soy; -— &(¥j)} > 0 in pr. 
j=l 
Now it is clear that as n > oo, &(Y,) > &(X) =m; hence also 
] n 
—S° Ej) > m. 
né 
j=l 


It follows that 
/ os Meee 
=> Y;—>m inpr., 


n ; 
j=l 


as was to be proved. 


For totally independent r.v.’s, necessary and sufficient conditions for 
the weak law of large numbers in the most general formulation, due to 
Kolmogorov and Feller, are known. The sufficiency of the following crite- 
rion is easily proved, but we omit the proof of its necessity (cf. Gnedenko and 


Kolmogorov [12}). 


Theorem 5.2.3. Let {X,} be a sequence of independent r.v.’s with d.f.’s 
{F,}; and $, = ei X ;. Let {b,,} be a given sequence of real numbers increa- 


sing to +o. 
Suppose that we have 


) iat Sno, 4Fi@) = 0), 


ce ies 
(i) Dyn Suicn, 2° dF 1) = 0(1)s 
then if we put 
(5) Gy 1 x dF (x), 
Ix]<b, 


we have 
1 


5 (S, — Qn) > O in pr. 


(6) 


Next suppose that the F,,’s have the property that there exists a A > 0 


such that 


(7) Vn: F,(0)>A, 1-—F,(O-) >A. 
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Then if (6) holds for the given {b,} and any sequence of real numbers {a,}, 
the conditions (i) and (ii) must hold. 
Remark. Condition (7) may be written as 
P{Xn SO} >A, PA{X, => 0} >A; 
when A = 5 this means that 0 is a median for each X,,; see Exercise 9 below. 
In general it ensures that none of the distribution is too far off center, and it 


is certainly satisfied if all F, are the same; see also Exercise 11 below. 
It is possible to replace the a, in (5) by 


| x dF ;(x) 
|x|<b; 


j=l 
and maintain (6); see Exercise 8 below. 


PROOF OF SUFFICIENCY. Define for each n > 1 and 1 <j <n: 


ike. alate 
ni =O, if |Xj| > Dp: 


and write , 
— 


Then condition (i) may be written as 
SAY nj #Xj} = 0(1); 
j=l 


and it follows from Boole’s inequality that 


(8) ATr #Si} <P ora #xa} = 2th, 2x) =00) 


j=l 


Next, condition (ii) may be written as 


j=l 


from which it follows, since {Y,,;,1 < j <n} are independent r.v.’s: 


y ige Vn j ~ “i Yai ‘ = 
ee @ =o (=) <} (4) ) = o(1). 


j=l 


Sy 
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Hence as in (2) of Sec. 5.1, 


(ee -—é Ty, : 
(9) Pu ee) — 0 inpr. 
Dn 
It is clear (why?) that (8) and (9) together imply 
Sn = éE(Tn) : 
———— > 0 inpr. 
Dn 


Since 


x|<bp 


ETn) = EU n= Def 2dP ila) = an, 
j=l jai“! 


(6) is proved. 


As an application of Theorem 5.2.3 we give an example where the weak 
but not the strong law of large numbers holds. 


Example. Let {X,,} be independent r.v.’s with a common d.f. F such that 
c 


PIX, =n} = PAPA{X; =—-—n} = Plog nee By Ages 


a 
sy 1 
2 3 =i | ; 


n=3 


where c is the constant 


We have then, for large values of n, 


c c 
dF(x) =n SS ~ 
‘ es Sa —k logk logn 


1 1S k? 
<n f 2 dF(x) = — > ~ . 
ixlsn k*logk = logn 


n n i3 


Thus conditions (i) and (ii) are satisfied with b, =n; and we have a, =0 by (5). 
Hence S,,/n — 0 in pr. in spite of the fact that é(|X)|) = +oo. On the other hand, 
we have 


AWK) > a) ~ 


so that, since X, and X,, have the same d_f., 


S" AX, | > 1) = S> P{\X1| > n} = 00. 


n n 


Hence by Theorem 4.2.4 (Borel—Cantelli), 


PYX,| >n 1.0.) = 1. 
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But |S, — S,-1| = |X,| > implies |S,| > n/2 or |S,—;| > 1/2; it follows that 
P {1So > 5 io. = 1, 


and so it is certainly false that S,,/n — 0 a.e. However, we can prove more. For any 
A > 0, the same argument as before yields 


PIX, | > An i.o.} = 1 


and consequently 


This means that for each A there is a null set Z(A) such that if w € Q\Z(A), then 


—— Sy A 
(10) lim 5n(@) >=. 

noo A 2 
Let Z = Urry Z(m); then Z is still a null set, and if w € &\Z, (10) is true for every 
A, and therefore the upper limit is +00. Since X is “symmetric” in the obvious sense, 
it follows that 


. — § 
lm —=-oo, lim —=+0 ae. 
n>oo Nh n7o n 


EXERCISES 
n 
Sn = SX). 
j=l 


1. For any sequence of r.v.’s {X,}, and any p > 1: 


S 
X, > O0ae.> — > Oae., 
n 


S 
xX, -0mlL?s— > 0inL?. 
n 


The second result is false for p < 1. 
2. Even for a sequence of independent r.v.’s {X,,}, 


Sn ; 
X, > Oin pr. A — — Oin pr. 
n 


(Hint: Let X, take the values 2” and 0 with probabilities n~! and 1 —1n7!.] 
3. For any sequence {X,,}: 
Sn . Xp . 
— —+> OQOinpr. > — — Oinpr. 
n n 


More generally, this is true if n is replaced by b,, where bn+; [by — 1. 
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*4, For any 6 > 0, we have 


: n n—-k 
mM Dy (j,) t=») *=0 


ioe) 
lk—n pi>nd 


uniformly in p:0 < p <1. 

*5. Let A(X; = 2") = 1/2”, n > 1; and let {X,,n > 1} be independent 
and identically distributed. Show that the weak law of large numbers does not 
hold for b, =n; namely, with this choice of b, no sequence {a,,} exists for 
which (6) is true. [This is the St. Petersburg paradox, in which you win 2” if 
it takes n tosses of a coin to obtain a head. What would you consider as a 
fair entry fee? and what is your mathematical expectation? ] . 

*6. Show on the contrary that a weak law of large numbers does hold for 
b, = nlogn and find the corresponding a,. [HiInt: Apply Theorem 5.2.3.] 

7. Conditions (i) and (ii) in Theorem 5.2.3 imply that for any 6 > 0, 


n 


ey dF j(x) = o(1) 


j=l |x|>db, 
and that a, = o(./nb,). 
8. They also imply that 


-r | xdF ;(x) = o(1). 


Bn “Jb; <|x1sbn 
[HinT: Use the first part of Exercise 7 and divide the interval of integration 
b; < |x| <b, into parts of the form A* < |x| < At! with A > 1.] 

9. A median of the r.v. X is any number @ such that 


PIX <a}>5, PK >aj> 


Show that such a number always exists but need not be unique. 

*10. Let {X,, 1 <n < 0} be arbitrary r.v.’s and for each n let m, be a 
median of X,. Prove that if X, — Xo. in pr. and my is unique, then m, > 
Moo. Furthermore, if there exists any sequence of real numbers {c,,} such that 
Xn —Cn — O in pr., then X, —m, — 0 in pr. 

11. Derive the following form of the weak law of large numbers from 
Theorem 5.2.3. Let {b,,} be as in Theorem 5.2.3 and put X, = 2b, for n > 1. 
Then there exists {a,} for which (6) holds but condition (1) does not. 
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12. Theorem 5.2.2 may be slightly generalized as follows. Let {X,,} be 
pairwise independent with a common d.f. F such that 


(1) xdF(x)=o0(1), (i)n i: dF (x) = o(1); 
|x|<n |xj>n 
then S,,/n — Q in pr. 

13. Let {X,} be a sequence of identically distributed strictly positive 
random variables. For any g such that g(n)/n — 0 as n — ov, show that 
PIS, > v(n) i.o.} = 1, and so S, — oo a.e. [HiInT: Let N,, denote the number 
of k <n such that X; < g(n)/n. Use Chebyshev’s inequality to estimate 
PIN, > n/2} and so conclude A{S, > y(n)/2} => 1 — 2F (y(n)/n). This pro- 
blem was proposed as a teaser and the rather unexpected solution was given 
by Kesten.] 

14. Let {b,} be as in Theorem 5.2.3. and put X, = 2b, for n > 1. Then 
there exists {a,} for which (6) holds, but condition (i) does not hold. Thus 
condition (7) cannot be omitted. 


5.3 Convergence of series 


If the terms of an infinite series are independent r.v.’s, then it will be shown 
in Sec. 8.1 that the probability of its convergence is either zero or one. Here 
we shall establish a concrete criterion for the latter alternative. Not only is 
the result a complete answer to the question of convergence of independent 
r.v.’s, but it yields also a satisfactory form of the strong law of large numbers. 
This theorem is due to Kolmogorov (1929). We begin with his two remarkable 
inequalities. The first is also very useful elsewhere; the second may be circum- 
vented (see Exercises 3 to 5 below), but it is given here in Kolmogorov’s 
original form as an example of true virtuosity. 


Theorem 5.3.1. Let {X,} be independent r.v.’s such that 
Vn: &(X,)=0, €(X2) =07(K,) < ©. 
Then we have for every € > 0: 


07 (Sn) 
e 


(1) A{ max IS;| > ee} < 
<j<n 


Remark. If we replace the max;<;<, |S;| in the formula by |S,|, this 
becomes a simple case of Chebyshev’s inequality, of which it is thus an 
essential improvement. 
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PROOF. Fix € > 0. For any w in the set 


A = {w: max |S ;(w)| > €}, 
<jx<n 
let us define 


vw) = min{j:1 < j <n, |S;(@)| > é}. 
Clearly v is an r.v. with domain A. Put 


Ay = {w: v(@) =k} = {or max [S;(@)| < €, |S(@)| > €}, 
<j<k- 
where for k = 1, max;<j;<o |S;(@)| is taken to be zero. Thus v is the “first 


time” that the indicated maximum exceeds €, and A; is the event that this 
occurs “for the first time at the kth step”. The A,;’s are disjoint and we have 


seks 
k=1 


It follows that 


(2) [say | s.aP => | [Se + (Sy — Se) dP 
. kat 7 Ak kai” Ak 


=> | Sk +2500 — 51) + Gn — Se”, 
k= Y Ak 


Let g, denote the indicator of A,, then the two r.v.’s gS; and S,, — S, are 
independent by Theorem 3.3.2, and consequently (see Exercise 9 of Sec. 3.3) 


| S;(Sp — Sp) dP = / (:5:)(Sn — SAP 
Ax Q 


= | oxsia f (Sy 5:47 = 0, 
Q Q 


since the last-written integral is 
n 


E(Sn-~Srv= D> &(Xj) =0. 


j=k+1 
Using this in (2), we obtain 


#S.)= | srar> | rar>y | Si dP 
Q A ya YAR 


> ES P(Ay) = PP(A), 
k=1 


5.3 CONVERGENCE OF SERIES | 123 


where the last inequality is by the mean value theorem, since |S;| > € on A; by 


definition. The theorem now follows upon dividing the inequality above by e€?. 


Theorem 5.3.2. Let {X,,} be independent r.v.’s with finite means and sup- 
pose that there exists an A such that 


(3) Vn: |X, — E(X,)| <A < ©. 
Then for every € > 0 we have 
(2A + 4e)? 


(4) 7 max ISji se} s 5X(S,) 


PROOF. Let Mo = Q, and for 1 <k <n: 
My = {o: imax |S; < €}, 
Ag = Mx-_1 _ Mx. 


We may suppose that A(M,,) > 0, for otherwise (4) is trivial. Furthermore, 
let Sy = 0 and for k > 1, 


X, = Xe — (Ke), Se = SD XS. 


Define numbers a;, 0 < k <n, as follows: 


7 


1 
ag => ——— Ss df, 
PM;) Ju, * 
so that 


(5) (S', — az) dP = 0. 
My 
Now we write 
6) [Spur aan Par = [OS — a4 + a4 — aug Xp) dP 
Mit Mx 
- | (Si, = ay + ag — Ona + X41) dP 
Arti 


and denote the two integrals on the right by /, and J», respectively. Using the 
definition of M; and (3), we have 


IS, — ae] = 


| 
Si ES) — Fa [ [Sy — (Sa? 


S,dF| < |S, +6 


1 
= |S, - ——— 
s P(Mk) Joy 
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] 


OPA? Se 
: P M41) Ivy, 


Ak — ky | = SydP 


] 
P(Mz) JM, 
1 

P(Mi+1) JM, 


(7) Xi, a7] <2e +A, 


It follows that, since Sz] < € on Aga, 
In < / (ISgl +e + 2e+A+A) dP < (464 2AYP(Ara1). 
k+l 
On the other hand, we have 
= i (SEO + Ge Gear) $X G1 HOS =a )lap any) 
Mi 
+ 2(S, — A )Xp 4, + 2a - An+1)Xp44} df. 
The integrals of the last three terms all vanish by (5) and independence, hence 
I, > j (S; — ay ar | x4 dP 
My My 
= |S, - an) dP + PMG)0%X ear) 
Mx 
Substituting into (6), and using M; > M,, we obtain for 0 <k<n-J: 
| Sir —aenPar— | 6i-a ae 
Miei My 
2 P(Mn Jo” (Xt41) — (4e + 2AP°P Aga). 
Summing over k and using (7) again for k = n: 


4e°P(M,, ) > / (S,| +6) dP > i (S) ~a,)° dP 
M, M, 


= P(Mn) 0? (Xj) — (4¢ + 2APP(Q\M,), 
j=l 
hence P 
(2A + 4e)’ > P(M,) S07(X)), 


j=l 


which is (4). 
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We can now prove the “three series theorem” of Kolmogorov (1929). 


Theorem 5.3.3. Let {X,,} be independent r.v.’s and define for a fixed con- 


stant A > 0: 
_ JX), if |Xn(@)| < A; 
ei o if |X, (w)| > A. 


Then the series 5°, X,, converges a.e. if and only if the following three series 
all converge: 


G) Vo, AliXal > A} = 0, AiXn F Yn}, 
(i) do, En), 
(iii) SY, 0° (Vn). 
PROOF. Suppose that the three series converge. Applying Theorem 5.3.1 
to the sequence {Y, — €(Y,,)}, we have for every m > 1: 


k n’ 
1 
P\ max |S {¥;— €(Y;)}] < —? = 1—-m S07 (¥)). 
: m 
j=n 


n<k<n' 


jen 


If we denote the probability on the left by (m,n, n'), it follows from the 
convergence of (iii) that for each m: 

lim lim A(m,n,n') = 1. 

n> n'> 00 
This means that the tail of }°,{Y, — &(Y,)} converges to zero a.e., so that the 
series converges a.e. Since (ii) converges, so does }>,, Y,. Since (i) converges, 
{X,,} and {Y,,} are equivalent sequences; hence }°,, X, also converges a.e. by 
Theorem 5.2.1. We have therefore proved the “if” part of the theorem. 

Conversely, suppose that 5°, X, converges a.e. Then for each A > 0: 


P(X, | > A io.) = 0. 


It follows from the Borel—Cantelli lemma that the series (i) must converge. 
Hence as before Wie Y,, also converges a.e. But since |Y, — &(Y,)| < 2A, we 
have by Theorem 5.3.2 


k 2 
4A +4 
Pp max |S 7 ¥) <1 2 
i jen ew) 

j=n 


Were the series (iii) to diverge, the probability above would tend to zero as 
n' -» oo for each n, hence the tail of 5°, Y, almost surely would not be 
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bounded by I, so the series could not converge. This contradiction proves that 
(iii) must converge. Finally, consider the series 5°, {Y, — ¢(Y,)} and apply 
the proven part of the theorem to it. We have P{|Y, — &(Y,)| > 2A} = 0 and 
é(Y, — &(Y,,)) = 0 so that the two series corresponding to (i) and (ii) with 
2A for A vanish identically, while (iii) has just been shown to converge. It 
follows from the sufficiency of the criterion that 5°, {Y, — &(Y,,)} converges 
a.e., and so by equivalence the same is true of $°,,{X, — &(Yn)}. Since >, Xn 
also converges a.e. by hypothesis, we conclude by subtraction that the series 
(11) converges. This completes the proof of the theorem. 


The convergence of a series of r.v.’s is, of course, by definition the same 
as its partial sums, in any sense of convergence discussed in Chapter 4. For 
series of independent terms we have, however, the following theorem due to 
Paul Lévy. 


Theorem 5.3.4. If {X,,} is a sequence of independent r.v.’s, then the conver- 
gence of the series 5°, X, in pr. is equivalent to its convergence a.e. 


PROOF. By Theorem 4.1.2, it is sufficient to prove that convergence of 
>>, Xn in pr. implies its convergence a.e. Suppose the former; then, given 
€:0 < € < 1, there exists mo such that if n > m > mo, we have 


(8) P{\Smnl > €} < € 


where 


smn = > Xj. 


j=m+1 


It is obvious that for m < k <n we have 


9) LJ € max [Sm jl < 26 Snel > 2€5 |Sianl SFC (Smal > 2 
k=m+1 ~ 


where the sets in the union are disjoint. Going to probabilities and using 
independence, we obtain 


n 


S Pt max 1 1Sim, i x 2€; Sink > 2E}P {|Skn| < €} < P{\Sman| > €}. 
m<jJSK— 
k=m+1 


If we suppress the factors 7{|S;,| < ¢€}, then the sum is equal to 


A{ max |Sin,j| > 2€} 
jen 


m< 


(cf. the beginning of the proof of Theorem 5.3.1). It follows that 


P{ max |Sm.j| > 2€} min P{|\Sunl <€} < PllSinnl > €}- 
jxn M<K<SA 


m< 
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This inequality is due to Ottaviani. By (8), the second factor on the left exceeds 
1 —€, hence if m > mo, 


1 
(10) = Pl max |Sp,j| > 2€} < ——PUSmn| > €} < ——. 
m<j<n l—e l—e 


Letting n — oo, then m— oo, and finally « > 0 through a sequence of 
values, we see that the triple limit of the first probability in (10) is equal 
to zero. This proves the convergence a.e. of 5°, X, by Exercise 8 of Sec. 4.2. 


It is worthwhile to observe the underlying similarity between the inequal- 
ities (10) and (1). Both give a “probability upper bound” for the maximum of 
a number of summands in terms of the last summand as in (10), or its variance 
as in (1). The same principle will be used again later. 

Let us remark also that even the convergence of S,, in dist. is equivalent 
to that in pr. or a.e. as just proved; see Theorem 9.5.5. 

We give some examples of convergence of series. 


Example. 5°, +1/n. 

This is meant to be the “harmonic series” with a random choice of signs in each 
term, the choices being totally independent and equally likely to be + or — in each 
case. More precisely, it is the series 


Xn 
ae. 


where {X,,, > 1} is a sequence of independent, identically distributed r.v.’s taking 
the values +1 with probability 5 each. 

We may take A = 1 in Theorem 5.3.3 so that the two series (i) and (ii) vanish iden- 
tically. Since o7(X,) = 0?(Y,) = 1/n?, the series (iii) converges. Hence, >, +1/n 
converges a.e. by the criterion above. The same conclusion applies to )>, +1/n® if 
5 < @ <1. Clearly there is no absolute convergence. For 0 < @ < i the probability 
of convergence is zero by the same criterion and by Theorem 8.1.2 below. 


EXERCISES 


1. Theorem 5.3.1 has the following “one-sided” analogue. Under the 
same hypotheses, we have 


PA max §; >) < o°(Sn) 
3 pes i= ~ +07(S,) 


[This is due to A. W. Marshall.] 
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*2. Let {X,,} be independent and identically distributed with mean 0 and 
variance 1. Then we have for every x: 


A{ max S; > x} < 2P(S, > x — V2n}. 


I<j<n 


[HINT: Let 
he = {max $j <x;S;, > x} 
<j< 


then Soy, P{Ani Sn — Sp => —V2n} < A{S, > x — V2n}.] 
3. Theorem 5.3.2 has the following companion, which is easier to prove. 
Under the joint hypotheses in Theorems 5.3.1 and 5.3.2, we have 


(A+e)? 


Pl max |S;| <€} < 
{max ISj1 S43 “a5 


4. Let {X,,, X),, > 1} be independent r.v.’s such that X,, and Xj, have 
the same distribution. Suppose further that all these r.v.’s are bounded by the 
same constant A. Then 

do Xn —X;,) 


converges a.e. if and only if 


S07 (Xn) OS: 


Use Exercise 3 to prove this without recourse to Theorem 5.3.3, and so finish 
the converse part of Theorem 5.3.3. 

*5, But neither Theorem 5.3.2 nor the alternative indicated in the prece- 
ding exercise is necessary; what we need is merely the following result, which 
is an easy consequence of a general theorem in Chapter 7. Let {X,} be a 
sequence of independent and uniformly bounded r.v.’s with o*(S;,) > oo. 
Then for every A > 0 we have 


lim A{|S,| < A} =0. 
noo 


Show that this is sufficient to finish the proof of Theorem 5.3.3. 


*6. The following analogue of the inequalities of Kolmogorov and Otta- 
viani is due to P. Lévy. Let S, be the sum of n independent r.v.’s and 
s = S,, —mo(S,), where mo(S,,) is a median of S,,. Then we have 


Lf. 0 F 0 © 
PA{ max [Sj > €} <3P4(S,| > ; 
I<j<n 2 


[HINT: Try “4” in place of “3” on the right] 


eR av a a re a ae 
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7. For arbitrary {X,}, if 
S> E(Xnl) < 00, 


then 5~,, X, converges absolutely a.e. 

8. Let {X,}, where n = 0,+1,+2,..., be independent and identically 
distributed according to the normal distribution ® (see Exercise 5 of Sec. 4.5). 
Then the series of complex-valued r.v.’s 


oo eX, oO aS am 
xXq + S a S a ee 
“Cin —in 


n=1 


where i = ./—1 and x is real, converges a.e. and uniformly in x. (This is 
Wiener’s representation of the Brownian motion process.) 

*9. Let {X,} be independent and identically distributed, taking the values 
0 and 2 with probability 5 each; then 


> Xn 
n 
n=l 3 
converges a.e. Prove that the limit has the Cantor d.f. discussed in Sec. 1.3. 
Do Exercise 11 in that section again; it is easier now. 

*10. If >>, £Xn converges a.e. for all choices of +1, where the X,,’s are 
arbitrary r.v.’s, then >, X,” converges a.e. [HINT: Consider >, rn (t)Xp,(w) 
where the r,,’s are coin-tossing r.v.’s and apply Fubini’s theorem to the space 
of (t, w).] 


5.4 Strong law of large numbers 


To return to the strong law of large numbers, the link is furnished by the 
following lemma on “summability”. 


Kronecker’s lemma. Let {x,} be a sequence of real numbers, {a,} a 
sequence of numbers >0 and ¢ oo. Then 


n 
Xi 1 
s — < converges > — ) xj 2 0. 

~~ Gn an “> 


PROOF. For 1 <n < oo let 
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If we also write a9 = 0, bp = 0, we have 
Xn = Gy (On — bn-1) 
and 
1 n 1 n 1 n—-l 
—Sox, = —S/aj(b; — bij) = 2, = — So bj (aj41 — aj) 
ay j=l an j=l ay j=0 


(Abel’s method of partial summation). Since aj; — a; > 0, 


1 n—1 
ue SS@a-4j)=1, 
n i=0 
and b, — boo, we have 


1 n 

an * 
fa] 

The lemma is proved. 


Now let g be a positive, even, and continuous function on FR! such that 

as |x| increases, 
p(x) g(x) 

(1) athe. ee 


[x| x? 


A 


Theorem 5.4.1. Let {X,,} be a sequence of independent r.v.’s with &(X,,) = 
0 for every n; and 0 <a, + oo. If g satisfies the conditions above and 


3 E@Kn)) oe 


(2) 
(an) 
then 
(3) S> ae converges a.e 
- ges a.e. 


n n 


prooF. Denote the d.f. of X, by F,,. Define for each n: 


_ Jf Xn), — if [Xn(@)| S an, 
4) a es i. if |X,(w)| > dp. 


Then 


(8 2 
SE (=) = a aN BaF ul) 
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By the second hypothesis in (1), we have 


for |x| < apy. 


It follows that 


veo (2) <De(2) = Eo Key tn 


n 


E(O(Xn)) 
Spey 


Thus, for the r.v.’s {Y, — €(Y,)}/an, the series (iii) in Theorem 5.3.3 con- 
verges, while the two other series vanish for A = 2, since |Y, — &(Y,)| < 2a,; 
hence 


(5) S- ae — &(Y,)} converges a.e. 
Next we have 
» aaa = y [_, s4F ao = is |, x4Faes) 
ee 2 dF aCe), 


n 


where the second equation follows from [eoxdF n(x) = 0. By the first hypo- 
thesis in (1), we have 
II e g(x) 
an (an) 


for |x| > ay. 


It follows that 


peau |é oe a p(x) Fine ee 3 E(Y(XKn)) ass 
|x|>ay, 


- gan) — (an) 


This and (5) imply that }> (Y,/ay,) converges a.e. Finally, since y +, we have 


ee ZY) =e ae one, dha) 5 z i COT Eo) 


|x|>ay, p(an) 
250 en) E(QXn Ye 
~ P(an) 


Thus, {X,,} and {Y,,} are equivalent sequences and (3) follows. 
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Applying Kronecker’s lemma to (3) for each w in a set of probability 
one, we obtain the next result. 


Corollary. Under the hypotheses of the theorem, we have 
(6) ==> Xj —>0O ae. 
Particular cases. (i) Let g(x) = |x|?, 1 < p <2; a, =n. Then we have 


Desa Ie 
(7) De pe UIXnl?) < 00 = — SX; > 0 a.e. 
n j=l 


For p=2, this is due to Kolmogorov; for 1< p< 2, it is due to 


Marcinkiewicz and Zygmund. 


(ii) Suppose for some 6, 0 < 6 < 1 and M < co we have 
Wn: &(|X,{17°) < M. 


Then the hypothesis in (7) is clearly satisfied with p= 1+ 6. This case is 
due to Markov. Cantelli’s theorem under total independence (Exercise 4 of 
Sec. 5.1) is a special case. 


(iii) By proper choice of {a,}, we can considerably sharpen the conclu- 
sion (6). Suppose 


n 
Vaio (X»)=or <0, o(S,) = = S> oF > oOo. 
j=l 
Choose g(x) =x? and a, =s,(logs,)“/%**, €>0, in the corollary to 
Theorem 5.4.1. Then 


&(X?) oe 
ye a2 = Ds s2 (log s,, )!+7¢ Oe 


n 


by Dini’s theorem, and consequently 


Sn 


SS 
Sn (log Sq )U/)+€ 


In case all o2 = 1 so that s? =n, the above ratio has a denominator that is 
close to n!/?, Later we shall see that n'/? is a critical order of magnitude 
for S,. 
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We now come to the strong version of Theorem 5.2.2 in the totally inde- 
pendent case. This result is also due to Kolmogorov. 


Theorem 5.4.2. Let {X,,} be a sequence of independent and identically distri- 
buted r.v.’s. Then we have 


Sn 

(8) E(X;))<0o> = —> &(X;) ae., 
—— |Sn| 

(9) &€(X1|)= cos lim — =+00 ae. 
n—>oco 


PROOF. To prove (8) define {Y,,} as in (4) with a, =n. Since 


So Pika # Yn} = J) PlXnl > 2} = 5 A{IXi| > 0} < 00 


by Theorem 3.2.1, {X,} and {Y,} are equivalent sequences. Let us apply (7) 
to {Y, — &(Y,)}, with g(x) = x”. We have 


2(Y, &(¥2 1 
(10) So as bal dF(x). 


2 
n 
n n xls 


We are obliged to estimate the last written second moment in terms of the first 
moment, since this is the only one assumed in the hypothesis. The standard 
technique is to split the interval of integration and then invert the repeated 
summation, as follows: 


io @) 1 n 
Ss a ss / x° dF (x) 
n=1 j=l °I7 


1l<|x|<j 


j=! n=] 
fora) oO 
Cc 
a, Ilgr@y-S cy f [xldF(@) 
el j-l<|x|<j J jel jJ-isialsy 


= Cé(|X1|) < ©. 


In the above we have used the elementary estimate }7*° ,n~* < Cj7! for 


some constant C and all j > 1. Thus the first sum in (10) converges, and we 
conclude by (7) that 


1 nv 
— y {Y;-—€Yj} 70 ae. 
n 

j=l 
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Clearly @(Y,) > &(X1) as n — oo; hence also 
Tig | 
— > &% sj) > EX), 
ar 


and consequently 


1 n 
—~\°¥;> &X) ae. 
n = 


By Theorem 5.2.1, the left side above may be replaced by (1/n) )0_) Xj, 
proving (8). 

To prove (9), we note that &(|X,|) = co implies &(|X;|/A) = oo for each 
A > 0 and hence, by Theorem 3.2.1, 


S > PIX1| > An) = +00. 


Since the r.v.’s are identically distributed, it follows that 
S° PXn| > An) = +00. 


Now the argument in the example at the end of Sec. 5.2 may be repeated 
without any change to establish the conclusion of (9). 


Let us remark that the first part of the preceding theorem is a special case 
of G. D. Birkhoff’s ergodic theorem, but it was discovered a little earlier and 
the proof is substantially simpler. 

N. Etemadi proved an unexpected generalization of Theorem 5.4.2: (8) 
is true when the (total) independence of {X,} is weakened to pairwise 
independence (An elementary proof of the strong law of large numbers, 
Z. Wahrscheinlichkeitstheorie 55 (1981), 119-122). 

Here is an interesting extension of the law of large numbers when the 
mean is infinite, due to Feller (1946). 


Theorem 5.4.3. Let {X,} be as in Theorem 5.4.2 with €(\X1|) = co. Let 
{a,,} be a sequence of positive numbers satisfying the condition a,/n +. Then 


we have 


(11) lim © 


=O ae, or=co ae. 
an 


according as 


(12) Peal = and =O | dF(x)< 00, or=oo. 


x|>ay 


PROOF. Writing 


oO 
i dF(x) => | dF (x), 
ix|>a, kan 2% S| <aK+1 
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substituting into (12) and rearranging the double series, we see that the series 
in (12) converges if and only if 


(13) Sok dF (x) < 00. 


k Ay—1 |X] <ay 


Assuming, this is the case, we put 
bn = / xdF (x); 
Ix|<a, 


Y.= Xn — Ln if Xn | < ay, 
a —Un if IXnl = an. 


Thus ¢(Y,,) = 0. We have by (12), 


(14) DLP n #Xn — Un} < ©. 


Next, with a9 = 0: 


/ x? dF (x) 
1 7 %-1 Sx] <a 


CO CO 1 
sf arena. 
ay) Sx] <a a, 


n=k 


and so 
[o.4) 


y2 / 
fj, ti< 2k dF(x)< «© 
6 (3) > a 


n K—-1 Slax <ax 


by (13). Hence 5° Y,/an converges (absolutely) a.e. by Theorem 5.4.1, and 
so by Kronecker’s lemma: 


1 n 
(15) —SoYk >0 ae. 


an k=1 
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We now estimate the quantity 


1 n 
Me = | xdF (zx) 
Ix|<ay 


a 
n k=} 


n 


(16) ae 
an 


as n — oo. Clearly for any N < n, it is bounded in absolute value by 


(17) — ( +f 1dr) 
n an <|x|<ay 


Since &(|X;{) = ov, the series in (12) cannot converge if a, /n remains boun- 
ded (see Exercise 4 of Sec. 3.2). Hence for fixed N the term (n/a, )ay in (17) 
tends to 0 as n — oo. The rest is bounded by 


(18) ~ > a; | dF(x)< > ap dF (x) 


1 j=N4] j-1 Six] <a; j=N41 aj-1<\x|<a; 


because na;/a, < j for j <n. We may now as well replace the n in the 
right-hand member above by oo; as N — ov, it tends to 0 as the remainder of 
the convergent series in (13). Thus the quantity in (16) tends to 0 as n > «~; 
combine this with (14) and (15), we obtain the first alternative in (11). 

The second alternative is proved in much the same way as in Theorem 5.4.2 
and is left as an exercise. Note that when a, = n it reduces to (9) above. 


Corollary. Under the conditions of the theorem, we have 
(19) P{|Sn| 2 dn 1.0.} = P{|Xn| 2 an i.0.}. 


This follows because the second probability above is equal to 0 or 1 
according as the series in (12) converges or diverges, by the Borel—Cantelli 
lemma. The result is remarkable in suggesting that the nth partial sum and 
the nth individual term of the sequence {X,} have comparable growth in a 
certain sense. This is in contrast to the situation when &(|X1|) < oo and (19) 
is false for a, = n (see Exercise 12 below). 


EXERCISES 


The X,,’s are independent throughout; in Exercises 1, 6, 8, 9, and 12 they are 
also identically distributed; S, = 4s Xj. 


*1. If €(X}) = +00, &(X7) < 00, then S,/n > +00 ae. 

*2. There is a complement to Theorem 5.4.1 as follows. Let {a,} and 
be as there except that the conditions in (1) are replaced by the condition that 
g(x) t and y(x)/|x| |. Then again (2) implies (3). 
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; 3. Let {X,} be independent and identically distributed r.v.’s such that 
tel) < oo for some p:0 < p <2; in case p > 1, we assume also that 
€(X1) = 0. Then S,n~°/?)~€ -5 0 ae. For p = 1 the result is weaker than 
Theorem 5.4.2. 


4. Both Theorem 5.4.1 and its complement in Exercise 2 above are “best 
possible” in the following sense. Let {a,} and y be as before and suppose that 


by > 0, 
by, 
Dee ae 
~~ pan) 


Then there exists a sequence of independent and identically distributed r.v.’s 
{X,,} such that ¢(X,) = 0, &(@(X,,)) = b,, and 


Xn 
De 2 — converges] = Uy 
an 


(unt: Make 3°, A{|X,| > an} = 00 by letting each X, take two or three 
values only according as b,/g(a,) < 1 or >1.] 

5. Let X, take the values +n°® with probability each. If 0<0< 
5 then S,,/n — 0 a.e. What if 0 > 5? [HinT: To answer the question, use 
Theorem 5.2.3 or Exercise 12 of Sec. 5.2; an alternative method is to consider 
the characteristic function of S,,/n (see Chapter 6).] 


6. Let €(X;) = 0 and {c,} be a bounded sequence of real numbers. Then 


[HINT: Truncate X,, at n and proceed as in Theorem 5.4.2.] 
7. We have S,,/n — 0 ae. if and only if the following two conditions 
are satisfied: 


(i) S,/n — O in pr., 
(ii) Syn /2” > 0 ae.; 


an alternative set of conditions is (1) and 


(ii) Ve > 0:35, PCS 2+ — Son| > 2"€) < 00. 


*8. If é( \X|) < oo, then the sequence {S,,/n} is uniformly integrable and 


S,/n > €(X1) in L' as well as a.e. 
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. 9. Construct an example where ¢(X 1) = &(X7) =+00 and S /n—-> 
00 a.e. [HINT: Let O<a< <1 and take adf. F such that ] — F(x) ~ x* 
as x —> oo, and f’. |x|P dF (x) < 00. Show that 


SoA max X} <n'/*) < 00 
n 


I<jn 


for every a’ > a@ and use Exercise 3 for ie! X;. This example is due to 
Derman and Robbins. Necessary and sufficient condition for S,, /n — +00 
has been given recently by K. B. Erickson. 


10. Suppose there exist ana, 0O<a<2,a41 
, , > and t 
and A2 such that # wo constants A, 


A 
Vn,Wx>0: “1 <PIX,| >) < 4. 
xe _ xe . 


If a> 1, suppose also that ¢(X,,) = 0 for each n. Then for any sequence {a,} 
increasing to infinity, we have 


0 1 
PAIS, | > dn i.0.} = { if SV. —2~ boo. 


[This result, due to P. Lévy and Marcinkiewicz, was stated with a superfluous 
condition on {a,,}. Proceed as in Theorem 5.3.3 but truncate X,, at a,; direct 
estimates are easy.] 


11. Prove the second alternative in Theorem 5.4.3. 

12. If €(X,;) #0, then maxj<gen [Xx|/|Sn| > O ae. [HINT: [X,|/n > 
0 ae.] 

13. Under the assumptions in Theorem 5.4.2, if S,,/n converges a.e. then 
&(|{X1|) < co. [Hint: X,,/n converges to 0 ae., hence P{|X,| > n 1.0.} = 0; 
use Theorem 4.2.4 to get $>,, P{|X1| > n} < o.] 


5.5 Applications 


The law of large numbers has numerous applications in all parts of proba- 
bility theory and in other related fields such as combinatorial analysis and 
statistics. We shall illustrate this by two examples involving certain important 
new concepts. 

The first deals with so-called “empiric distributions” in sampling theory. 
Let {X,, > 1} be a sequence of independent, identically distributed r.v.’s 
with the common d.f. F. This is sometimes referred to as the “underlying” 
or “theoretical distribution” and is regarded as “unknown” in statistical lingo. 
For each w, the values X,(w) are called “samples” or “observed values’, and 
the idea is to get some information on F by looking at the samples. For each 
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n, and each w € &2, let the n real numbers {X ;(w), 1 < j <n} be arranged in 
increasing order as cade 


(1) Yri(@) < Yn2(@) < SESS Yun (@). 
Now define a discrete d.f. F,,(-, w) as follows: 


Fro) =0, ifx<Y,1(@), 


ke 
Hine ay if Yne(@) Sx < Yngti(o), 1 <k<n-1, 
F(x, @) = 1, if x > Yan (o). 


In other words, for each x, nF, (x, w) is the number of values of Ji<j<n, 
for which X ;(w) < x; or again F,,(x, w) is the observed frequency of sample 
values not exceeding x. The function F,,(-, @) is called the empiric distribution 
function based on n samples from F. 

For each x, F,,(x, -) is an r.v., as we can easily check. Let us introduce 
also the indicator r.v.’s {&;(x), j > 1} as follows: 


1 if X;(w) <x, 
£00) = {4 ieee. 


We have then 
1 n 
Fr, w) = ar SEG, W). 
j=l 


For each x, the sequence {&;(x)} is totally independent since {X;} is, by 
Theorem 3.3.1. Furthermore they have the common “Bernoullian distribution”, 
taking the values 1 and 0 with probabilities p and g = 1 — p, where 


pHFioO,. @= lf GQ); 


thus ¢(&(x)) = F(x). The strong law of large numbers in the form 
Theorem 5.1.2 or 5.4.2 applies, and we conclude that 


(2) F,(,@)—> FQ) ae. 


Matters end here if we are interested only in a particular value of x, or a finite 
number of values, but since both members in (2) contain the parameter x, 
which ranges over the whole real line, how much better it would be to make 
a global statement about the functions F,(-, @) and F(-). We shall do this in 
the theorem below. Observe first the precise meaning of (2): for each x, there 
exists a null set N(x) such that (2) holds for w € Q\N(x). It follows that (2) 
also holds simultaneously for all x in any given countable set Q, such as the 
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set of rational numbers, for w € Q\N, where 
N= (JN) 
xEQ 


is again a null set. Hence by the definition of vague convergence in Sec. 4.4, 
we can already assert that 


F,(,@)—> F(-) for ae. w. 


This will be further strengthened in two ways: convergence for all x and 
uniformity. The result is due to Glivenko and Cantelli. 


Theorem 5.5.1. We have as n > 00 


sup |F,@,@)—-FQ)|—>0 ae. 


—~WO<x< 00 


PROOF. Let J be the countable set of jumps of F. For each x € J, define 


1, if X ;(w) = x; 
nitro) = { 9 if Xj(w) &x. 


Then for x € J: 


1 n 
F(x, @) — Fa (x~, @) = — ) I nj(x, ©), 
j=l 


and it follows as before that there exists a null set N(x) such that if w € 
Q\N(x), then 


(3) F,(x+, @) — F,(x—, @) > F(t) — F(x). 


Now let N; = Uxeous N(x), then N, is a null set, and if w € Q\Nj, then (3) 
holds for every x € J and we have also 


(4) F,(x,w) > F(x) 
for every x € Q. Hence the theorem will follow from the following analytical 
result. 
Lemma. Let F, and F be (right continuous) d-f.’s, Q and J as before. 
Suppose that we have 

Vx € O: Fy, (x) > F); 

Vx EJ: F, (x) — Fp(x—-) > F(x) — F(x—). 


Then F,, converges uniformly to F in Z’. 
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PROOF. Suppose the contrary, then there exist « > 0, a sequence {nx} of 
integers tending to infinity, and a sequence {x;} in #! such that for all k: 


(5) Fn, (xk) — FQxx)| = € > 0. 


This is clearly impossible if x, — ++oo or x, — —oo. Excluding these cases, 
we may suppose by taking a subsequence that x, — & € #!. Now consider 
four possible cases and the respective inequalities below, valid for all suffi- 
ciently large k, where r; € Q,72 €O,1 <& <1. 


Case 1. xp t Ex, < &: 
€ < Fn, Qe) — Fx) S Fn, (€-) — Fn) 
< Fr, (§-) — Fn, (§) + Fa, (72) — P(r.) + P(r) — Fn). 
Case 2. x, 4 &,x%.< &: 
€ < F(xx) — Fn, Qe) S F(E-) — Fn, (1) 
= F(§—-) — F(r)) + Fr) — Fa, (71). 
Case 3.x, L&,x, > &: 
€ < FQ) — Fa, Qe) S Fr2) — Fa, ) 
< F(r.)— FQ) + Fr) — Fai) + Fn, €-) — Fn, )- 
Case 4. x, | §,x, = &: 
€ < Fa, Qe) — FOR) S Fn, (v2) ~ FE) 
= Fy, (r2) — Fa, (4) + Fn, (1) — Fr) + Pn) — FP). 


In each case let first k > oo, then r; t &, ro | &; then the last member of 
each chain of inequalities does not exceed a quantity which tends to 0 and a 
contradiction is obtained. 


Remark. The reader will do well to observe the way the proof above 
is arranged. Having chosen a set of @ with probability one, for each fixed 
w@ in this set we reason with the corresponding sample functions F,(-, w) 
and F(-,@) without further intervention of probability. Such a procedure is 
standard in the theory of stochastic processes. 


Our next application is to renewal theory. Let {X,, = 1} again be a 
sequence of independent and identically distributed r.v.’s. We shall further 
assume that they are positive, although this hypothesis can be dropped, and 
that they are not identically zero a.e. It follows that the common mean is 
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strictly positive but may be ++oo. Now the successive r.v.’s are interpreted as 
“lifespans” of certain objects undergoing a process of renewal, or the “return 
periods” of certain recurrent phenomena. Typical examples are the ages of a 
succession of living beings and the durations of a sequence of services. This 
raises theoretical as well as practical questions such as: given an epoch in 
time, how many renewals have there been before it? how long ago was the 
last renewal? how soon will the next be? 

Let us consider the first question. Given the epoch ¢t > 0, let N(t, w) 
be the number of renewals up to and including the time ¢. It is clear that 
we have 


(6) {w: N(t, w) = n} = {w: Sn(@) <t < Snii@)}, 
valid for n > 0, provided Sy = 0. Summing over n < m — 1, we obtain 
(7) {w: N(t, w) < m} = {w:S,(@) > t}. 


This shows in particular that for each t > 0, N(t) = N(t, -) is a discrete r.v. 
whose range is the set of all natural numbers. The family of r.v.’s {N(t)} 
indexed by t € [0, co) may be called a renewal process. If the common distri- 
bution F of the X,’s is the exponential F(x) = 1 — e~*”, x = 0; where A > 0, 
then {N(t), t > 0} is just the simple Poisson process with parameter x. 

Let us prove first that 


(8) lim N(t)=+00 ae., 
too 


namely that the total number of renewals becomes infinite with time. This is 
almost obvious, but the proof follows. Since N(t, w) increases with t, the limit 
in (8) certainly exists for every w. Were it finite on a set of strictly positive 
probability, there would exist an integer 7 such that 


P{ sup N(t,w) <M} > 0. 


0<1<0co 
This implies by (7) that 
P{Sy(w) = +00} > 0, 


which is impossible. (Only because we have laid down the convention long 
ago that an r.v. such as X, should be finite-valued unless otherwise specified.) 
Next let us write 


(9) O<m= &€(X;) < +0, 


and suppose for the moment that m < +00. Then, according to the strong law 
of large numbers (Theorem 5.4.2), S,/n —> m a.e. Specifically, there exists a 
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null set Z; such that 


Xi) +--+ Xn) _ 


Vw € Q\Z): lim 
n->00 n 


We have just proved that there exists a null set Z2 such that 
Vw € Q\Z>: Jim N(t, w) = +00. 
00 


Now for each fixed wo, if the numerical sequence {a,(w@o),m => 1} converges 
to a finite (or infinite) limit m and at the same time the numerical function 
{N(t, wo), 0 < t < co} tends to +00 as t —> +00, then the very definition of a 
limit implies that the numerical function {ay (1,4,)(@o), 0 < t < oo} converges 
to the limit m as t + +00. Applying this trivial but fundamental observation to 


1 n 
a, = S_X; 
ne 
j=l 
for each w in Q\(Z; UZ), we conclude that 


(10) LE) as ne 
t>co N(t, w) 


By the definition of V(t, w), the numerator on the left side should be close to 
t; this will be confirmed and strengthened in the following theorem. 
Theorem 5.5.2. We have 

(11) lim -——=-— ae. 


and 
_ €{N(t)} 1 
lim ———— = —~; 


? 


[> 00 t m 
both being true even if m = +00, provided we take 1/m to be 0 in that case. 


PROOF. It follows from (6) that for every @: 
Sn(tw)(@) St < Sy¢o)+1(@) 
and consequently, as soon as ¢ is large enough to make N(f, w) > 0, 


Swao@) 2 Swiw)+1(@) N(t, w) + 1 
N(t,@) ~ N(t,@) Ntt,@)+1 N(,o) 


Letting f > oo and using (8) and (10) (together with Exercise 1 of Sec. 5.4 
in case m = +00), we conclude that (11) is true. 
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The deduction of (12) from (11) is more tricky than might have been 
thought. Since X,, is not zero a.e., there exists 6 > O such that 


Vn: P{X, > 6} = p> 0. 
Define 
mwa {i Memes 
and let S), and N’(t) be the corresponding quantities for the sequence {X},,n > 
1}. It is obvious that S) < S, and N’(t) > N(t) for each t. Since the r.v.’s 


{X/, /5} are independent with a Bernoullian distribution, elementary computa- 
tions (see Exercise 7 below) show that 


2 


EAN (t)2} = O (5) Reheses. 


Hence we have, 6 being fixed, 


ACY pA) Jom 


Since (11) implies the convergence of N(¢t)/t in distribution to 451, an appli- 
cation of Theorem 4.5.2 with X, = N(n)/n and p=2 yields (12) with t 
replaced by in (12), from which (12) itself follows at once. 


Corollary. For each t, é{N(t)} < o. 


An interesting relation suggested by (10) is that @{Sjyv(y} should be close 
to mé{N(t)} when t is large. The precise result is as follows: 


E{X, +--+ + Xnyygi} = Xi E{ING@) + 1}. 


This is a striking generalization of the additivity of expectations when the 
number of terms as well as the summands is “random.” This follows from the 
following more general result, known as “Wald’s equation”. 


Theorem 5.5.3. Let {X,, > 1} be a sequence of independent and identi- 
cally distributed r.v.’s with finite mean. For k > 1 let A, 1 < k < mw, be the 
Borel field generated by {X;,1 < j <k}. Suppose that N is an r.v. taking 
positive integer values such that 


(13) Vk >1:{N <k} E€%, 
and ¢(N) < oo. Then we have 


é(Sy) = &(X) EN). 
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PROOF. Since Sg = O as usual, we have 


foe) co uk 
(14) &(Sy) = i SvydP = i. S,dP = i. X;dP 
Q 2 (N=k) 2S x {N=k} : 


k=1 j=1 
oe) [o.¢) lo ¢) 
=> } x)aP= > | X;dP 
j=l k=j {N=k} j=l {N>j) 
[oe] 
= {ean | Kar}, 
j=l {N<j-1} 


Now the set {NV < j — 1} and ther.v. X ; are independent, hence the last written 
integral is equal to &(X;)A{N < j — 1}. Substituting this into the above, we 
obtain 
ie.@) oO 
E(Su) = 5° EX PAIN = jf} = EK) SPIN = jf} = ER DEW), 
t=) j=l 


the last equation by the corollary to Theorem 3.2.1. 


It remains to justify the interchange of summations in (14), which is 
essential here. This is done by replacing X; with |X ;| and obtaining as before 
that the repeated sum is equal to é(|X,|)@(V) < oo. 

We leave it to the reader to verify condition (13) for the rv. N(t)+ 1 
above. Such an r.v. will be called “optional” in Chapter 8 and will play an 
important role there. 

Our last example is a noted triumph of the ideas of probability theory 
applied to classical analysis. It is S. Bernstein’s proof of Weierstrass’ theorem 
on the approximation of continuous functions by polynomials. 


Theorem 5.5.4. Let f be a continuous function on [0, 1], and define the 
Bernstein polynomials { p,,} as follows: 


es k 
(15) pod = oF (*) (Z) faa 
k=0 


Then p, converges uniformly to f in [0, 1]. 


PROOF. For each x, consider a sequence of independent Bernoullian r.v.’s 
{X,,n > 1} with success probability x, namely: 


x= ] with probability x, 
7“ 10 with probability 1 — x; 
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and let S,, = S°y_, X as usual. We know from elementary probability theory 


that 
n 


k 


Sn 
pain e{r(S)}. 


We know from the law of large numbers that S,,/n — x with probability one, 
but it is sufficient to have convergence in probability, which is Bernoulli’s 
weak law of large numbers. Since f is uniformly continuous in [0, 1], it 
follows as in the proof of Theorem 4.4.5 that 


AS, = k) = ( )#a ar, O<k<n, 


so that 


ofr (32)} = even = se. 


We have therefore proved the convergence of p,(x) to f(x) for each x. It 
remains to check the uniformity. Now we have for any 6 > 0: 


Si 
(16) [Pnx) — f@)| < ef f () — f@) } 
Sh n 
=e{ (>) — f(x) iE == >>} 
<i} 


efle (2) — ra 


where we have written €{Y; A} for f a af. Given € > 0, there exists d(€) 
such that 


Sn 
s|— —x 
n 


Ix—yl 58 > If @) — fOr s €/2. 


With this choice of 6 the last term in (16) is bounded by €/2. The preceding 
term is clearly bounded by 
> s| ; 


Now we have by Chebyshev’s inequality, since &(S,) = nx, o7(S,) = nx(1 — 
x), and x(1 —x) <! forO<x <1: 


4 
S 1 4 (Sn nx(1 ~— x) 1 
Z a Be) ee ee ae ee . 
ay > 5} = 50 @ Sn? ~ 48n 
This is nothing but Chebyshev’s proof of Bernoulli’s theorem. Hence if n > 
| f|/d2e, we get |p, (x) — f (x)| < € in (16). This proves the uniformity. 


Sn 
= t= 
n 


2usIP { 


n 


5.5 APPLICATIONS | 147 


One should remark not only on the lucidity of the above derivation but 
also the meaningful construction of the approximating polynomials. Similar 
methods can be used to establish a number of well-known analytical results 
with relative ease; see Exercise 11 below for another example. 


EXERCISES 


{X,,} 1s a sequence of independent and identically distributed r.v.’s; 
See) AE 
j=l 


1. Show that equality can hold somewhere in (1) with strictly positive 
probability if and only if the discrete part of F does not vanish. 


*2. Let F, and F be as in Theorem 5.5.1; then the distribution of 
sup |F,(x,@) — F@)| 


—OO<X< 00 


is the same for all continuous F. [HINT: Consider F(X), where X has the 
df. F.j 


3. Find the distribution of Y,,, 1 <k <n, in (1). [These r.v.’s are called 
order statistics. | 


*4, Let S, and N(t) be as in Theorem 5.5.2. Show that 


©O 
EIN(O} = D> PlSn < th. 
n=1 
This remains true if X, takes both positive and negative values. 
5. If &(X,) > 0, then 


CO 
sim} Us, < a] = 0, 


n=] 
*6. For each r > 0, define 
v(t, w) = min{n:|S,(w)| > t} 


if such an n exists, or too if not. If A(X; #0) > 0, then for every r > 0 and 
r>0wehave /{v(t) > n} <A" for some A < 1 and all large n; consequently 
&{v(t)'} < oo. This implies the corollary of Theorem 5.5.2 without recourse 
to the law of large numbers. [This is Charles Stein’s theorem. ] 

*7. Consider the special case of renewal where the r.v.’s are Bernoullian 
taking the values 1 and 0 with probabilities p and 1 — p, where 0<p<l. 
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Find explicitly the d.f. of v(0) as defined in Exercise 6, and hence of v(r) 
for every t > 0. Find “{v(r)} and ¢{v(1)*}. Relate v(t, w) to the N(t,w) in 
Theorem 5.5.2 and so calculate &{N(t)} and &{N(t)*}. 


8. Theorem 5.5.3 remains true if ¢(X;) is defined, possibly +00 or —oo. 


*9, In Exercise 7, find the df. of X,) for a given ¢. &{X 1} is the mean 
lifespan of the object living at the epoch ¢; should it not be the same as ¢ {X;}, 
the mean lifespan of the given species? [This is one of the best examples of 
the use or misuse of intuition in probability theory.] 


10. Let t be a positive integer-valued r.v. that is independent of the X,,’s. 
Suppose that both t and X; have finite second moments, then 


o° (St) = &(t)o*(X1) + 07 (t)(E(X1))’. 


*11. Let f be continuous and belong to L’(0, oo) for some r > 1, and 
oO 
2ay= fe r@at. 
0 


Then i 
f(x) = lim naar (=)" ge» (=). 


n>0 (n —1)! \x x 


where g"~!) is the (n — 1)st derivative of g, uniformly in every finite interval. 
(HINT: Let A > 0, A{X)(A) < t} = 1 —e-™’. Then 


ELF Sn Q)} = [(~1)" 1 /(n _ Ly parg’-Va) 


and S,,(n/x) — x in pr. This is a somewhat easier version of Widder’s inver- 
sion formula for Laplace transforms.] 


12. Let P{X, =k} = pp, 1 <k < £, S4_, pe = 1. Let N(n, @) be the 
number of values of j, 1 < j <n, for which X; = k and 


£ 
I]@.0=[[a°. 
k=1 


g | [(, @) exists a.€. 


N > 


and find the limit. [This is from information theory.] 


Bibliographical Note 


Borel’s theorem on normal numbers, as well as the Borel—Cantelli lemma in 
Secs. 4.2—4.3, is contained in 
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Emile Borel, Sur les probabilités dénombrables et leurs applications arithmétiques, 
Rend. Circ. Mat. Palermo 26 (1909), 247~271. 


This pioneering paper, despite some serious gaps (see Fréchet [9] for comments), 
is well worth reading for its historical interest. In Borel’s Jubilé Selecta (Gauthier- 
Villars, 1940), it is followed by a commentary by Paul Lévy with a bibliography on 
later developments. 

Every serious student of probability theory should read: 


A. N. Kolmogoroff, Uber die Summen durch den Zufall bestimmten unabhdngiger 
Gréssen, Math. Annalen 99 (1928), 309-319; Bermerkungen, 102 (1929), 
484-488. 


This contains Theorems 5.3.1 to 5.3.3 as well as the original version of Theorem 5.2.3. 
For all convergence questions regarding sums of independent r.v.’s, the deepest 
study is given in Chapter 6 of Lévy’s book [11]. After three decades, this book remains 
a source of inspiration. 
Theorem 5.5.2 is taken from 


J. L. Doob, Renewal theory from the point of view of probability, Trans. Am. Math. 
Soc. 63 (1942), 422-438. 


Feller’s book [13], both volumes, contains an introduction to renewal theory as well 
as some of its latest developments. 


6 Characteristic function 


6.1 General properties; convolutions 


An important tool in the study of r.v.’s and their p.m.’s or d.f.’s is the char- 
acteristic function (ch.f.). For any r.v. X with the p.m. yw and df. F, this is 
defined to be the function f on &! as follows, Vt € &!: 


[e,6) 

(1) f@= é(é*)= | eX) Pda) = | e (dx) = / e™ dF (x). 

Q R! —0o 
The equality of the third and fourth terms above is a consequence of 
Theorem 3.32.2, while the rest is by definition and notation. We remind the 
reader that the last term in (1) is defined to be the one preceding it, where the 
one-to-one correspondence between y and F is discussed in Sec. 2.2. We shall 
use both of them below. Let us also point out, since our general discussion 
of integrals has been confined to the real domain, that f is a complex-valued 
function of the real variable +, whose real and imaginary parts are given 
respectively by 


Rf(@)= [ cosstutdx), If(= [ sinsxtucae 
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Here and hereafter, integrals without an indicated domain of integration are 
taken over Z!. 

Clearly, the ch.f. is a creature associated with yw or F, not with X, but 
the first equation in (1) will prove very convenient for us, see e.g. (iii) and (v) 
below. In analysis, the ch.f. is known as the Fourier—Stieltjes transform of 
or F. It can also be defined over a wider class of x or F and, furthermore, 
be considered as a function of a complex variable ¢, under certain conditions 
that ensure the existence of the integrals in (1). This extension is important in 
some applications, but we will not need it here except in Theorem 6.6.5. As 
specified above, it is always well defined (for all real f) and has the following 
simple properties. 


(i) Wt &': 


IfOlS1=fO); fi-h=f, 
where Z denotes the conjugate complex of z. 


(ii) f is uniformly continuous in Z!. 
To see this, we write for real t and h: 


fathy— f= / (eile — ely (dx), 


if @-+h) — FO! < / ie Je — 1 u(dx) = / ie™ — tu (dx). 


The last integrand is bounded by 2 and tends to 0 as h — 0, for each x. Hence, 
the integral converges to 0 by bounded convergence. Since it does not involve 
t, the convergence is surely uniform with respect to f. 


(iii) If we write fy for the ch.f. of X, then for any real numbers a and 
b, we have 


faxso(t) = fx(atye™, 
f-x(t) = fx(@). 
This is easily seen from the first equation in (1), for 


& (el laX+b)y _ & (el ae . eit?) _ E(eTOX ) git | 


(iv) If {fn,n > 1} are chf.’s, A, > 0, S002, An = 1, then 


Santa 


n=] 


is a ch.f. Briefly: a convex combination of ch-f.’s is a ch.f. 
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For if {u,, > 1} are the corresponding p.m.’s, then eer Anln IS a p.m. 
whose ch.f. is 3700 Anfn- 


(v) If {f;,1 < j <n} are ch-f.’s, then 


IIs: 
j=l 
is a ch-f. 


By Theorem 3.3.4, there exist independent r.v.’s {X;,1 < j <n} with 
probability distributions {4 ;,1< j <n}, where jz; is as in (iv). Letting 


Sn = >_ Xj, 
j=l 
we have by the corollary to Theorem 3.3.3: 
. n n ; n 
E(eltSn =€ []e™ _ II &(eXi) _ I] f j@); 
j=) j=l j=l 
or in the notation of (iii): 
(2) fs, =|] fx,- 
j=l 


(For an extension to an infinite number of f ;’s see Exercise 4 below.) 

The ch.f. of S, being so neatly expressed in terms of the ch-f.’s of the 
summands, we may wonder about the df. of S,. We need the following 
definitions. 


DEFINITION. The convolution of two d.f.’s F, and F> is defined to be the 
d.f. F such that 


oo 


F(x — y)dF(y), 


(3) WE RFX) = / 


and written as 
F=F,x Fo. 


It is easy to verify that F is indeed a d.f. The other basic properties of 
convolution are consequences of the following theorem. 


Theorem 6.1.1. Let X; and X2 be independent r.v.’s with d.f.’s Fy and Fo, 
respectively. Then X; + Xz has the df. Fy * F2. 
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PROOF. We wish to show that 
(4) Vx: P{X, +X2 <x} = (F) * F2)(x). 
For this purpose we define a function f of (x), x2) as follows, for fixed x: 


if x} + x2 <x; 
otherwise. 


fi, x2) = 10 


f is a Borel measurable function of two variables. By Theorem 3.3.3 and 
using the notation of the second proof of Theorem 3.3.3, we have 


[ te.xnar = | Ff (x1, x2) 0? (dx1, dx) 


RR 


-| prtdx2) | Ff (1, X2) 1 (dx} ) 
Rl Rl 


=| potdr2) | [41 (dx;) 
R! (—00,x—x3] 
= / dF 9(x3)F1(x — x2). 


This reduces to (4). The second equation above, evaluating the double integral 
by an iterated one, is an application of Fubini’s theorem (see Sec. 3.3). 


Corollary. The binary operation of convolution * is commutative and asso- 
Clative. 

For the corresponding binary operation of addition of independent r.v.’s 
has these two properties. 


DEFINITION. The convolution of two probability density functions p; and 
Pz is defined to be the probability density function p such that 


ioe) 


(5) Wx € R!: p(x) = / pilx — y)pr(y) dy, 


and written as 
P= Pi * P2.- 


We leave it to the reader to verify that p is indeed a density, but we will 
spell out the following connection. 


Theorem 6.1.2. The convolution of two absolutely continuous d.f.’s with 
densities p; and p2 is absolutely continuous with density p; * po. 
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PROOF. We have by Fubini’s theorem: 


/ pwdu= [ au | piu — v)pr(v) dv 


= | / pilu~v)du) patw)ae 


= / F(t — v)pr(v) dv 


—-&X 


=| Fi(x — v) dF 2(v) = (F; * F2)(). 


This shows that p is a density of F) * F>. 


What is the p.m., to be denoted by jz; * 42, that corresponds to F; * Fz? 
For arbitrary subsets A and B of R!, we denote their vector sum and difference 
by A+ B and A — B, respectively: 


(6) AtB={x+y:x €A,y €B}: 


and write x +B for {x} +B, —B for 0—8B. There should be no danger of 
confusing A — B with A\B. 


Theorem 6.1.3. For each B € &, we have 
(7) (141 * 12)(B) = [ “ua(B — yyur(dy). 
R 


For each Borel measurable function g that is integrable with respect to 41 * (2, 
we have 


(8) [sent euarawy = ff ee + yur(asyuctdy) 
a FIR! 
PROOF. It is easy to verify that the set function (4 * 42)(-) defined by 
(7) is a p.m. To show that its d.f. is F; * F2, we need only verify that its value 
for B = (—oo, x] is given by the F(x) defined in (3). This is obvious, since 
the right side of (7) then becomes 


[ rie-ymatan= | re-yar2oy 


Now let g be the indicator of the set B, then for each y, the function g, defined 
by g(x) = g(x + y) is the indicator of the set B — y. Hence 


I 8x + y)u1 (dx) = 41 (B — y) 
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and, substituting into the right side of (8), we see that it reduces to (7) in this 
case. The general case is proved in the usual way by first considering simple 
functions g and then passing to the limit for integrable functions. 


As an instructive example, let us calculate the ch.f. of the convolution 
[L, * La. We have by (8) 


/ Jn evi) = : i, ele yy (dx)un(dy) 


= J etunasy fe uacay, 


This is as it should be by (v), since the first term above is the ch.f. of X + Y, 
where X and Y are independent with 4; and 2 as p.m.’s. Let us restate the 
results, after an obvious induction, as follows. 


Theorem 6.1.4. Addition of (a finite number of) independent r.v.’s corre- 
sponds to convolution of their d.f.’s and multiplication of their ch.f.’s. 


Corollary. If f is a chf., then so is | f|*. 


To prove the corollary, let X have the ch.f. f. Then there exists on some 
82 (why?) an r.v. Y independent of X and having the same d.f., and so also 
the same ch.f. f. The ch.f. of X — Y is 


(MAY) = EEE") = FOF (—D = [FOP. 


The technique of considering X ~— ¥ and |f|? instead of X and f will 
be used below and referred to as “symmetrization” (see the end of Sec. 6.2). 
This is often expedient, since a real and particularly a positive-valued ch.f. 
such as |f|? is easier to handle than a general one. 

Let us list a few well-known ch.f.’s together with their d.f.’s or p.d.’s 
(probability densities), the last being given in the interval outside of which 
they vanish. 


(1) Point mass at a: . 
df. 5;  ch.f. e'”. 


(2) Symmetric Bernoullian distribution with mass $ each at +1 and —1: 
df. 5(61 +6_,); ch.f. cost. 
(3) Bernoullian distribution with “success probability” p,andg = 1 — p: 


d.f. g59 + pd); ch.f. g + pe’ =14 ple" — 1). 
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(4) Binomial distribution for n trials with success probability p: 


df. S” (7) peg" *3.; chf. (¢ + pe)". 
k=0 


(5) Geometric distribution with success probability p: 


df. So q" pdr; ch. pl — ge)! 
n=0 


(6) Poisson distribution with (mean) parameter A: 
lo 6) yn : 

=" : A(e#~1) 

d.f. ) e min ch.f. e : 


(7) Exponential distribution with mean A7!: 
p.d. Ae~* in [0, 00); chf. 1 —avtity}. 
(8) Uniform distribution in [—a, +a]: 


] 
ic aon, Mig ea torn) 
2a 
(9) Triangular distribution in [—a, a]: 
Me a ee 
— |x| 2(1 — cos af) ees 
pa - in [—a, a]; ch.f. a eee = at 
Z 


(10) Reciprocal of (9): 


1—cosax . It] 
p.d. aaa ae (—co,co); chf. |} 1——j})V0 
max a 
(11) Normal distribution N(m, 02) with mean m and variance o7: 


(x —m) 


1 
exp |— 
J 210 : 20? 
ot? 
ch.f. exp (im a ~~) P 


Unit normal distribution N(O, 1) = ® with mean 0 and variance 1: 


p.d. | in (—00, 00); 


~x?/2 


p.d. in (—00, 00); chf. ma 


1 
e 

JV 20 
(12) Cauchy distribution with parameter a > 0: 


p.d. ra) in (—00, 00); ch.f. ew", 
m(a 


6.1 GENERAL PROPERTIES; CONVOLUTIONS | 157 


Convolution is a smoothing operation widely used in mathematical anal- 
ysis, for instance in the proof of Theorem 6.5.2 below. Convolution with the 
normal kernel is particularly effective, as illustrated below. 

Let ng be the density of the normal distribution N(0, 6°), namely 


x2 


] 
As\X) = —==— Cx - = 


For any bounded measurable function f on &!, put 


). —0O <xX< ©. 


lo. @) [o.0] 
©) fale) =(fnsyex) = [fey mady= [ nate—wForay. 
—0o —X 
It will be seen below that the integrals above converge. Let C#° denote the 
class of functions on &! which have bounded derivatives of all orders; Cy 
the class of bounded and uniformly continuous functions on &!. 


Theorem 6.1.5. For each 6 > 0, we have fs ¢ C?. Furthermore if f € Cy, 
then fs > f uniformly in Z!. 


PROOF. It is easily verified that n; ¢ C#°. Moreover its kth derivative n® 


is dominated by c;,3723 where cys is a constant depending only on k and 6 so 
that 


| nye y Foray] < cxall lf nas(x — y)dy = call fl. 


Thus the first assertion follows by differentiation under the integral of the last 
term in (9), which is justified by elementary rules of calculus. The second 
assertion is proved by standard estimation as follows, for any n > 0: 


If) — fal) < / Lf) — fe — y)Ina(y) dy 


< sup [f(x) — foe yII + ails | ns(y) dy. 
yl>n 


lylsn 


Here is the probability idea involved. If f is integrable over 2', we may 
think of f as the density of ar.v. X, and ns as that of an independent normal 
r.v. Ys. Then fs is the density of X + Ys by Theorem 6.1.2. As 5] 0, Y5 
converges to 0 in probability and so X + Y3 converges to X likewise, hence 
also in distribution by Theorem 4.4.5. This makes it plausible that the densities 
will also converge under certain analytical conditions. 

As a corollary, we have shown that the class C# is dense in the class Cy 
with respect to the uniform topology on :#!. This is a basic result in the theory 
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of “generalized functions” or “Schwartz distributions”. By way of application, 
we state the following strengthening of Theorem 4.4.1. 


Theorem 6.1.6. If {u,,} and jz are s.p.m.’s such that 
veece: | feounax > [  feoutdr 
R R 


then [n> LL. 


This is an immediate consequence of Theorem 4.4.1, and Theorem 6.1.5, 
if we observe that Co C Cy. The reduction of the class of “test functions” 
from Co to Cy is often expedient, as in Lindeberg’s method for proving central 
limit theorems; see Sec. 7.1 below. 


EXERCISES 


1. . If f is a chf., and G a df. with G(0O—) = 0, then the following 
functions are all ch.f.’s: 


1 [e-e) fee) 
| f (ut) du, | f Cute“ du, [ e" dG(u), 
0 0 0 
| mer dG(u), [ ~ f (ut)dG(u). 
0 0 


*2. Let f(u, t) be a function on (—00, 00) x (—0x, 00) such that for each 
u, f (u,-) is a ch.f. and for each t, f(-, t) is a continuous function; then 


[ fu, t)dGtu) 


is a ch.f. for any d.f. G. In particular, if f is a ch.f. such that lim; f(t) 
exists and G ad.f. with G(O—) = 0, then 


[ f (<) dG(u) is ach. 
0 u 


3. Find the d.f. with the following ch.f.’s (a > 0, B > Q): 
a ] ] 
a+r? (l—a@it®’ (Ltap —afet)/8 


[HINT: The second and third steps correspond respectively to the gamma and 
Polya distributions.} 
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4. Let S, be as in (v) and suppose that S,, — So. in pr. Prove that 
ee) 
I[ Fi 
j=l 


converges in the sense of infinite product for each ¢t and is the ch.f. of Soo. 
5. If F, and F> are d.f.’s such that 


Pi ye bby 
Z 


and F has density p, show that F; * F2 has a density and find it. 

*6. Prove that the convolution of two discrete d.f.’s is discrete; that of a 
continuous d.f. with any df. is continuous; that of an absolutely continuous 
d.f. with any d.f. is absolutely continuous. 

7. The convolution of two discrete distributions with exactly m and n 
atoms, respectively, has at least m+n — 1 and at most mn atoms. 

8. Show that the family of normal (Cauchy, Poisson) distributions is 
closed with respect to convolution in the sense that the convolution of any 
two in the family with arbitrary parameters is another in the family with some 
parameter(s). 

9. Find the nth iterated convolution of an exponential distribution. 

*10. Let {X;, 7 = 1} be a sequence of independent r.v.’s having the 
common exponential distribution with mean 1/A, A > 0. For given x > 0 let 
v be the maximum of n such that S, <x, where So =0, S, = )707_, Xj as 
usual. Prove that the r.v. v has the Poisson distribution with mean Ax. See 
Sec. 5.5 for an interpretation by renewal theory. 

11. Let X have the normal distribution ®. Find the d.f., p.d., and ch-f. 
Of X?, 

12. Let {X;,1 < j <n} be independent r.v.’s each having the d.f. ®. 


Find the ch.f. of : 
2 
SX} 
j=l 


and show that the corresponding p.d. is 27"/?P'(n/2)7!x/)-1e-*/? in (0, 00). 
This is called in statistics the “y~ distribution with n degrees of freedom”. 


13. For any ch.f. f we have for every ft: 
Rl — f()] = FRO — f (20). 
14. Find an example of two r.v.’s X and Y with the same p.m. y that are 


not independent but such that X + Y has the p.m. yu * w. (HINT: Take X = Y 
and use ch.f.] 
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*15. Forad.f. F and h > 0, define 
Or(h) = sup[F(x +h) — F@—)]; 


Qr is called the Lévy concentration function of F. Prove that the sup above 
is attained, and if G is also a d.f., we have 


Vh > 0: Orch) < Orth) A Qg(h). 


16. If 0 < hd < 27, then there is an absolute constant A such that 
A Xr 
Or(h) < ~ / if @ldr, 
0 


where f is the ch.f. of F. [Hmnt: Use Exercise 2 of Sec. 6.2 below.] 
17. Let F be a symmetric d.f., with ch.f. f > 0 then 


fore) 2 foe) 
pr(h) ey ee | = nf ei Gydi 
- 0 


oo A? + x? 


is a sort of average concentration function. Prove that if G is also a df. with 
ch.f. g > 0, then we have Vh > 0: 


greg(h) < or(h) A go h); 
1 — gr-g(h) < [1 —or(h)] + [1 — ge (A)]. 


*18. Let the support of the p.m. u on &! be denoted by supp jz. Prove 
that 


supp (4 * v) = closure of supp yz + supp v; 
supp (41 * 2 *-++) = closure of (supp 4; + supp 2 +---) 


where “+” denotes vector sum. 


6.2 Uniqueness and inversion 


To study the deeper properties of Fourier—Stieltjes transforms, we shall need 
certain “Dirichlet integrals”. We begin with three basic formulas, where “sgn 
a” denotes 1, 0 or —1, according as a > 0, = 0, or < 0. 


» sin ax ™ sinx 
(1) Vy 2 0:0.< (ena) | dx = | ae ee 
0 0 


x 


°° sin ax ue 
(2) / dx = — sgna. 
0 x 2 
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loo) 
_ 
(3) / sce eee a 
0 2 


x2 


The substitution ax = u shows at once that it is sufficient to prove all three 
formulas for a = 1. The inequality (1) is proved by partitioning the interval 
[0, oo) with positive multiples of 2 so as to convert the integral into a series 
of alternating signs and decreasing moduli. The integral in (2) is a standard 
exercise in contour integration, as is also that in (3). However, we shall indicate 
the following neat heuristic calculations, leaving the justifications, which are 
not difficult, as exercises. 


oO 8 fee) 6) oO Ce 
i sin x ee [ gas | Pa au| dx = if ene sins da] du 
5 x 0 0 0 0 
oO 1 = oO ] x co a d 
| seseoaicad ay ae i, sinudu| dx = sin u i | du 
0 ee 0 x? LJo 0 a 


We are ready to answer the question: given a ch.f. f, how can we find 
the corresponding d.f. F or p.m. w? The formula for doing this, called the 
inversion formula, is of theoretical importance, since it will establish a one- 
to-one correspondence between the class of d-f.’s or p.m.’s and the class of 
ch.f.’s (see, however, Exercise 12 below). It is somewhat complicated in its 
most general form, but special cases or variants of it can actually be employed 
to derive certain properties of a d.f. or p.m. from its ch.f.; see, e.g., (14) and 
(15) of Sec. 6.4. 


Theorem 6.2.1. If x; < x, then we have 


(4) (Orr, X2)) + Su (1}) + 3 (22}) 
] T evi _ eT 


(the integrand being defined by continuity at tf = 0). 


PROOF. Observe first that the integrand above is bounded by |x; — x2| 
everywhere and is O(|t|~!) as |t]} > 00; yet we cannot assert that the “infinite 
integral” ie exists (in the Lebesgue sense). Indeed, it does not in general 


(see Exercise 9 below). The fact that the indicated limit, the so-called Cauchy 
limit, does exist 1s part of the assertion. 
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We shall prove (4) by actually substituting the definition of f into (4) 
and carrying out the integrations. We have 


] T 4-itx, _ p~itxa oO 
(5) eee ae i, uid) dt 
it nas 


Qn Jar 


fone) T elf —x1) — elf x2) 
= / atl (dx). 
~coo LJ—T 2nit 


Leaving aside for the moment the justification of the interchange of the iterated 
integral, let us denote the quantity in square brackets above by /(T, x, x1, x2). 
Trivial simplification yields 


1 Tes t(x — 1 T Gk 
Mxm.a)= = f eee a= | sine 22) 5 
ie : m Jo t 
It follows from (2) that 


1 1 
=e =5) = 0 for x < x, 
0-(-})=4 for x = x1, 
jim 1D, x, X15 42) = $—(-$)=1 for x} <x < Xo, 
= i 1 
Bo 2 for x = x2, 
y~ 57 =0 for x > x. 


Furthermore, J is bounded in T by (1). Hence we may let T — o6 under the 
integral sign in the right member of (5) by bounded convergence, since 


2 7 sinx 
I(T, x, X41, X2)| < ~ | ——- dx 
JO Xx 


by (1). The result is 


1 ] 
‘| o+ | s+ 1+ =+/ ob wax) 
(00.1) tn) 2 J yx) (2) 2 J0,00) 


= Fu({x1}) + a(x, x2)) + 54 (22). 


This proves the theorem. For the justification mentioned above, we invoke 
Fubini’s theorem and observe that 
x2 
/ eit du 
x) 


where the integral is taken along the real axis, and 


elflx—a Jian elf(x—x2) 


en 1x4 — X22]; 


it 


2 T 
[fea -rldt was) = 27 —22l < 00, 
Ai JS -T 


6.2 UNIQUENESS AND INVERSION | 163 


so that the integrand on the right of (5) is dominated by a finitely integrable 
function with respect to the finite product measure dr - (dx) on [—T, +7] x 
R'. This suffices. 


Remark. If x; and x2 are points of continuity of F, the left side of (4) 
iS F(x) _ F(x). 


The following result is often referred to as the “uniqueness theorem” for 
the “determining” yu or F (see also Exercise 12 below). 


Theorem 6.2.2. If two p.m.’s or d.f.’s have the same ch-f., then they are the 
same. 


PROOF. If neither x; nor x2 is an atom of yw, the inversion formula (4) 
shows that the value of on the interval (x, x2) is determined by its ch.f. It 
follows that two p.m.’s having the same ch.f. agree on each interval whose 
endpoints are not atoms for either measure. Since each p.m. has only a count- 
able set of atoms, points of @! that are not atoms for either measure form a 
dense set. Thus the two p.m.’s agree on a dense set of intervals, and therefore 
they are identical by the corollary to Theorem 2.2.3. 


We give next an important particular case of Theorem 6.2.1. 


Theorem 6.2.3. If f € L'(—oo, +00), then F is continuously differentiable, 
and we have 


(6) F'(x) = i / ~ e F(t) dt. 
20 Jo 


proor. Applying (4) for x. = x and x} =x —h with h > 0 and using F 
instead of 4, we have 
— —h)+ FQ -h- 1 fee. 
FQ)+FQ@-) Fh) +FQ@=h-) _ | et I its gy at. 
2 2 27 


oo it 


Here the infinite integral exists by the hypothesis on f, since the integrand 
above is dominated by |hf(t)|. Hence we may let h — O under the integral 
sign by dominated convergence and conclude that the left side is 0. Thus, F 
is left continuous and so continuous in %!. Now we can write 


F(x)—FQ—-h) 1 [ eth |] 
h In Jig ith 


e' f(t) dt. 


The same argument as before shows that the limit exists as h > 0. Hence 
F has a left-hand derivative at x equal to the right member of (6), the latter 
being clearly continuous [cf. Proposition (ii) of Sec. 6.1]. Similarly, F has a 
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right-hand derivative given by the same formula. Actually it is known for a 
continuous function that, if one of the four “derivates” exists and is continuous 
at a point, then the function is continuously differentiable there (see, e.g., 
Titchmarsh, The theory of functions, 2nd ed., Oxford Univ. Press, New York, 
1939, p. 355). 


The derivative F’ being continuous, we have (why?) 


Vx: F(x) = a F'(u) du. 


Thus F’ is a probability density function. We may now state Theorem 6.2.3 
in a more symmetric form familiar in the theory of Fourier integrals. 


Corollary. If f € L', then p € L!, where 


1? 
p= on 1 e'™ F(t) dt, 
TJ —oo 


and 


fH= A e'™ p(x) dx. 


The next two theorems yield information on the atoms of 4 by means of 
f and are given here as illustrations of the method of “harmonic analysis”. 


Theorem 6.2.4. For each xo, we have 
Ly ft os 
: —1txo = ‘ 
(7) jim, oF 1 f(t) dt = w({xo}) 


PROOF. Proceeding as in the proof of Theorem 6.2.1, we obtain for the 
integral average on the left side of (7): 


(8) / So aa) +f lu(dx). 
R' {xo} T(x ir Xo) {xo} 


The integrand of the first integral above is bounded by 1 and tends to 0 as 
T —> o6 everywhere in the domain of integration; hence the integral converges 
to 0 by bounded convergence. The second term is simply the right member 
of (7). 


Theorem 6.2.5. We have 


] T 
(9) Jim se f foPar = > aca’. 


xER! 


oc 
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PROOF. Since the set of atoms is countable, all but a countable number 
of terms in the sum above vanish, making the sum meaningful with a value 
bounded by 1. Formula (9) can be established directly in the manner of (5) 
and (7), but the following proof is more illuminating. As noted in the proof 
of the corollary to Theorem 6.1.4, |f|* is the ch.f. of the r.v. X — Y there, 
whose distribution is 4 * 42’, where ’(B) = u(—B) for each B € &. Applying 
Theorem 6.2.4 with x9 = 0, we see that the left member of (9) is equal to 


(uw * 4')({0}). 
By (7) of Sec. 6.1, the latter may be evaluated as 
: u'd-y)u(dy) = D7 wtybuciyp, 
yeR! 


since the integrand above is zero unless —y is an atom of yw’, which is the 
case if and only if y is an atom of yw. This gives the right member of (9). The 
reader may prefer to carry out the argument above using the r.v.’s X and —Y 
in the proof of the Corollary to Theorem 6.1.4. 


Corollary. is atomless (F is continuous) if and only if the limit in the left 
member of (9) is zero. 


This criterion is occasionally practicable. 


DEFINITION. The r.v. X is called symmetric iff X and —X have the same 
distribution. 


For such an r.v., the distribution has the following property: 
VB € ZB: w(B) = w(—B). 


Such a p.m. may be called symmetric; an equivalent condition on its df. F is 


as follows: 
vx € #!: F(x) = 1 — F(-x-), 


(the awkwardness of using d.f. being obvious here). 


Theorem 6.2.6. X or yu is symmetric if and only if its ch.f. is real-valued 
(for all t). 


proor. If X and —X have the same distribution, they must “determine” 
the same ch.f. Hence, by (iii) of Sec. 6.1, we have 


fH=fO 
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and so f is real-valued. Conversely, if f is real-valued, the same argument 
shows that X and —X must have the same ch.f. Hence, they have the same 
distribution by the uniqueness theorem (Theorem 6.2.2). 


EXERCISES 


f is the ch.f. of F below. 


1. Show that 5 
oO ye: 
/ (=) dn = 7 
0 Xx 2 


*2. Show that for each T > 0: 
1 [ (1 — cos Tx) cos tx 
H Jo x2 


dx =(T —|t|) v0. 


Deduce from this that for each T > 0, the function of t given by 


(-8) 


is a ch.f. Next, show that as a particular case of Theorem 6.2.3, 
l—cosTx 1 [ 
=~ |] (T= |t\)e dt. 

2 2J_7 


Finally, derive the following particularly useful relation (a case of Parseval’s 
relation in Fourier analysis), for arbitrary a and T > 0: 


°° 1—cosT(x — a) 11 r ita 
[ ee are = sp | o- Inve fad, 


Xx 


*3. Prove that for each a > 0: 
° 1 ¢*% 1- t_. 
| [F(x +u)—F(x—u)|du= — / ee Fn) de. 
0 T J e0 t 
As a sort of reciprocal, we have 

1] a u ie,°) _ 

- [ du| f(t)dt= / ACOSO dF (x). 

2 0 —u x2 


—oo 


4. If f(t)/t € L'(—oo, 00), then for each w > 0 such that +a are points 
of continuity of F, we have 


© sinat 


1 
F(a) — F(-a) = - | f (t)dt. 
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5. What is the special case of the inversion formula when f = 1? Deduce 
also the following special cases, where aw > 0: 


1 © sinatsint 

— —x— dt=aNl, 

T J x0 t 
1 f© sinat(sint) a 
- | men at =a —~ fora <2;1 fora > 2. 
HT Jeo t 4 


6. For each n > 0, we have 


~[- (my dt = [ [oomatau 
T J—oo 


where 9] = Fey and @, = Gn_1 * G for n > 2. 

*7, If F is absolutely continuous, then limyj400 f(t) = 0. Hence, if the 
absolutely continuous part of F does not vanish, then lim,+co | f (| < 1. If 
F is purely discontinuous, then lim,-+ 00 f(t) = 1. [The first assertion is the 
Riemann—Lebesgue lemma; prove it first when F has a density that is a 
simple function, then approximate. The third assertion is an easy part of the 
observation that such an f is “almost periodic’”.] 

8. Prove that for 0 < r < 2 we have 


~1—-RSO 
[. |x|" dF (x) = cof nas dt 


°° 1 —cosu i Tar+1). rt 
C = ———_—__— qd = =, 
(r) ([- Mies “) = sin > 


where 


thus C(1) = 1/7. [ent 


1 — cos xt 
kl’ =C) [. dt] 


gyre 


*9. Give a trivial example where the right member of (4) cannot be 
replaced by the Lebesgue integral 


—itxe 


f(ndt. 


1 [o.@) 11x] 


Qn 


é —e 


But it can always be replaced by the improper Riemann integral: 


T2 eo _ ex 


lim — Rf dt. 
I 
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10. Prove the following form of the inversion formula (due to Gil- 
Palaez): 


T pltx £¢__7\) _ pitx 


510 7 
Tio 5 2mit 


[HINT: Use the method of proof of Theorem 6.2.1 rather than the result.] 


11. Theorem 6.2.3 has an analogue in L?. If the ch.f. f of F belongs 
to L’, then F is absolutely continuous. [HINT: By Plancherel’s theorem, there 
exists g € L? such that 


x 1 lee) ew ix —] 
du = —— ——____. at. 
[ ewau = | atoa 


Now use the inversion formula to show that 
] xX 
F@)— FO) == | u)du.] 
(x) (0) Tix Jp pu) 


*12. Prove Theorem 6.2.2 by the Stone—Weierstrass theorem. [HINT: Cf. 
Theorem 6.6.2 below, but beware of the differences. Approximate uniformly 
g1 and g> in the proof of Theorem 4.4.3 by a periodic function with “arbitrarily 
large” period. ] 

13. The uniqueness theorem holds as well for signed measures [or func- 
tions of bounded variations]. Precisely, if each ;, i = 1, 2, is the difference 
of two finite measures such that 


ve: / ew (dx) = / e' us(dx), 


then 4] = Lp. 
14, There is a deeper supplement to the inversion formula (4) or 
Exercise 10 above, due to B. Rosén. Under the condition 


fo + log |x|)dF (x) < oo, 


the improper Riemann integral in Exercise 10 may be replaced by a Lebesgue 
integral. [HINT: It is a matter of proving the existence of the latter. Since 


ee) N 
/ dF(y) [ 
—00 0 


we have 


foe) N og _ N (oe) 
/ ary) | me a= | = / sin(x — y)tdF(y). 


sin(x — y)t 
t 


CO 
dt< / adF(y){1 + logd + N|x — yl)} < co, 
Ce 
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For fixed x, we have 


/ aF(y)| far 
y7tx N t 


+f dF ()), 
0<|x—yl<1/N 


both integrals on the right converging to 0 as N > o.] 


ccf dF(y) 
~ Ix-ylz1/n Nix — yl 


6.3 Convergence theorems 


For purposes of probability theory the most fundamental property of the ch-f. 
is given in the two propositions below, which will be referred to jointly as the 
convergence theorem, due to P. Lévy and H. Cramér. Many applications will 
be given in this and the following chapter. We begin with the easier half. 


Theorem 6.3.1. Let {u,, 1 <n < 00} be p.m.’s on &! with ch.f.’s {f,, 1 < 
n < oo}. If uw, converges vaguely to oo, then f, converges to fo. uniformly 
in every finite interval. We shall write this symbolically as 


(1) Te ge ae ie ee 
Furthermore, the family {f,,} is equicontinuous on Z!. 


PROOF. Since e’* is a bounded continuous function on #', although 
complex-valued, Theorem 4.4.2 applies to its real and imaginary parts and 
yields (1) at once, apart from the asserted uniformity. Now for every ¢t and h, 
we have, as in (11) of Sec. 6.1: 


fa(t +h) — fall < / le — 1|p,(dx) < / Vix poy (dx) 


jx|<A 


|x|>A 


+2 f Te ae ina +2 | Cee 
\x]>A 


for any € > 0, suitable A and n > no(A, €). The equicontinuity of { f,,} follows. 


This and the pointwise convergence f, — foo imply f eee oe by a simple 
compactness argument (the “3¢e argument’) left to the reader. 


Theorem 6.3.2. Let {u,,1 <n < co} be p.m.’s on #! with ch.f.’s {f,, 1 < 
n < oo}. Suppose that 


(a) f, converges everywhere in %! and defines the limit function fo; 
(b) foo IS continuous at ¢ = 0. 
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Then we have 


(@) Ln Loo. Where [igo is a p.m.; 
(B) foo is the ch.f. of Moo. 


PROOF. Let us first relax the conditions (a) and (b) to require only conver- 
gence of f, in a neighborhood (—ép, 69) of t = 0 and the continuity of the 
limit function f (defined only in this neighborhood) at t = 0. We shall prove 
that any vaguely convergent subsequence of {u,,} converges to a p.m. 4. For 
this we use the following lemma, which illustrates a useful technique for 
obtaining estimates on a p.m. from its ch.f. 


Lemma. For each A > 0, we have 


-1 
(2) y([—2A, 2A]) > A i f(tat| —1. 
—Aq! 
PROOF OF THE LEMMA. By (8) of Sec. 6.2, we have 
1 7 © sin Tx 
3 — that = dx). 
3) =f fo a =~ H(d2) 


Since the integrand on the right side is bounded by 1 for all x (it is defined to be 
1 at x = 0), and by |Tx|~! < (2TA)7! for |x| > 2A, the integral is bounded by 


1 
M([—2A, 2A]) + a7 ~ w([—2A, 2A]})} 


1 


1 
= (: aad sa) w([—2A, 2A]) + 


2TA- 
Putting T = A7! in (3), we obtain 


f (jdt 


An! wl ver 1 


-A-! 


which reduces to (2). The lemma is proved. 
Now for each 6, 0 < 6 < do, we have 


1 é 
55 [ fear 


The first term on the right side tends to 1 as 6 | 0, since f(0)=1 and f 
is continuous at 0; for fixed 5 the second term tends to 0 as n > ow, by 
bounded convergence since |f,, — f| < 2. It follows that for any given € > 0, 


a 


1 8 
-= a ita) — flat. 


1 6 
(4) ag | altar 
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there exist 5 = 5(€) < 59 and no = no(e) such that if n > ng, then the left 
member of (4) has a value not less than 1 — €. Hence by (2) 


(5) Un ((—267!, 26-1]) > 20 —e€) -1 > 1 —2e. 


Let {in,} be a vaguely convergent subsequence of {,}, which always 
exists by Theorem 4.3.3; and let the vague limit be 4, which is always an 
s.p.m. For each 6 satisfying the conditions above, and such that neither —257! 
nor 257! is an atom of 4, we have by the property of vague convergence 
and (5): 


u(R') > w([-287!, 2871]) 
= lim pn ({[-26-', 287"]) > 1-—2e. 


Since € is arbitrary, we conclude that yz is a p.m., as was to be shown. 

Let f be the ch-f. of ~. Then, by the preceding theorem, f,, — f every- 
where; hence under the original hypothesis (a) we have f = fo. Thus every 
vague limit 4 considered above has the same ch-f. and therefore by the unique- 
ness theorem is the same p.m. Rename it gg So that Wo is the p.m. having 
the ch.f. fo. Then by Theorem 4.3.4 we have Ln Moo. Both assertions (a) 
and (£) are proved. 


As a particular case of the above theorem: if {u%,,1 <n < oo} and 
{fn,1 <n < oo} are corresponding p.m.’s and ch_f.’s, then the converse of 
(1) is also true, namely: 


(6) Ln boo & fn-> foo 


This is an elegant statement, but it lacks the full strength of Theorem 6.3.2, 

which lies in concluding that f.. is a ch.f. from more easily verifiable condi- 

tions, rather than assuming it. We shall see presently how important this is. 
Let us examine some cases of inapplicability of Theorems 6.3.1 and 6.3.2. 


Example 1. Let yz, have mass ; at 0 and mass ; atn. Then fy — bec, Where [Loo 


has mass } at 0 and is not a p.m. We have 
fn) = f + rem, 
which does not converge as n — 00, except when ¢ is equal to a multiple of 27. 


Example 2. Let yw, be the uniform distribution [—n, n]. Then Wy —> oo, Where boo 
is identically zero. We have 


sin nt 
; if 10; 
ro=4 nt ; 7 
1, if t= 0; 
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and 
0, if t 40; 
] 


fa 10=4 if1=0. 


Thus, condition (a) is satisfied but (b) is not. 


Later we shall see that (a) cannot be relaxed to read: f(t) converges in 
|t| < T for some fixed T (Exercise 9 of Sec. 6.5). 

The convergence theorem above settles the question of vague conver- 
gence of p.m.’s to a p.m. What about just vague convergence without restric- 
tion on the limit? Recalling Theorem 4.4.3, this suggests first that we replace 
the integrand e'” in the ch.f. f by a function in Co. Secondly, going over the 
last part of the proof of Theorem 6.3.2, we see that the choice should be made 
so as to determine uniquely an s.p.m. (see Sec. 4.3). Now the Fourier-Stieltjes 
transform of an s.p.m. is well defined and the inversion formula remains valid, 
so that there is unique correspondence just as in the case of a p.m. Thus a 
natural choice of g is given by an “indefinite integral” of a ch.f., as follows: 


(7) Oe ‘ ; | elds) i= / sae er 
0 R Rl 1x 


Let us call g the integrated characteristic function of the s.p.m. . We are 
thus led to the following companion of (6), the details of the proof being left 
as an exercise. 


Theorem 6.3.3. A sequence of s.p.m.’s {u4,, 1 <n < 00} converges (to Loo) 
if and only if the corresponding sequence of integrated ch.f.’s {g,} converges 
(to the integrated ch.f. of Uo). 


Another question concerning (6) arises naturally. We know that vague 
convergence for p.m.’s is metric (Exercise 9 of Sec. 4.4); let the metric be 
denoted by (-, -);. Uniform convergence on compacts (viz., in finite intervals) 
for uniformly bounded subsets of Cp(#!) is also metric, with the metric 
denoted by (-, -)2, defined as follows: 


if — a) 
(f,8)2 = oo “ages. 


It is easy to verify that this is a metric on Cz and that convergence in this metric 
is equivalent to uniform convergence on compacts; clearly the denominator 
1 +7? may be replaced by any function continuous on -%!, bounded below 
by a strictly positive constant, and tending to +00 as |t| > oo. Since there is 
a one-to-one correspondence between ch.f.’s and p.m.’s, we may transfer the 
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metric (-,-)2 to the latter by setting 


(1, v)2 = (Fur fode 


in obvious notation. Now the relation (6) may be restated as follows. 


Theorem 6.3.4. The topologies induced by the two metrics ( ), and ( )> on 
the space of p.m.’s on &! are equivalent. 


This means that for each yz and given € > 0, there exists d(4z, €) such that: 
(uw, v)y < d(u, €) > (uw, v2 <€, 
(Md, v)2 < S(u, €) > (uw, v)1 Se. 


Theorem 6.3.4 needs no new proof, since it is merely a paraphrasing of (6) in 
new words. However, it is important to notice the dependence of 5 on w (as 
well as €) above. The sharper statement without this dependence, which would 
mean the equivalence of the uniform structures induced by the two metrics, 
is false with a vengeance; see Exercises 10 and 11 below (Exercises 3 and 4 
are also relevant). 


EXERCISES 


1. Prove the uniform convergence of f, in Theorem 6.3.1 by an inte- 
gration by parts of fe dF, (x). 

*2. Instead of using the Lemma in the second part of Theorem 6.3.2, 
prove that ju is a p.m. by integrating the inversion formula, as in Exercise 3 
of Sec. 6.2. (Integration is a smoothing operation and a standard technique in 
taming improper integrals: cf. the proof of the second part of Theorem 6.5.2 
below.) 

3. Let F be a given absolutely continuous d.f. and let F, be a sequence 
of step functions with equally spaced steps that converge to F uniformly in 
R. Show that for the corresponding ch.f.’s we have 

Wn: sup \f@) = Ffna®| = 1. 
tek 

4. Let F,,G, be d.f.’s with ch.f.’s f, and gn. If fn — gn > O ae., 
then for each f € Cx we have f{f fdF,—{ f dG, — 0 (see Exercise 10 
of Sec. 4.4). This does not imply the Lévy distance (F,,G,); — 0; find 
a counterexample. [Hint: Use Exercise 3 of Sec. 6.2 and proceed as in 
Theorem 4.3.4.] 

5. Let F be a discrete d.f. with points of jump {a;, j => 1} and sizes of 
jump {b;, j = 1}. Consider the approximating s.d.f.’s F, with the same jumps 
but restricted to j <n. Show that F,—>F. 
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*6. If the sequence of ch.f.’s {f;,} converges uniformly in a neighborhood 
of the origin, then { f,,} is equicontinuous, and there exists a subsequence that 
converges to a ch.f. [HintT: Use Ascoli-Arzela’s theorem.] 

7. If F,2F and G,->G, then F,*Gr->F *G. [A proof of this simple 
result without the use of ch.f.’s would be tedious.] 

*§. Interpret the remarkable trigonometric identity 


sint  - t 
he = [] cos 
n=l 


in terms of ch.f.’s, and hence by addition of independent r.v.’s. (This is an 
example of Exercise 4 of Sec. 6.1.) 


9. Rewrite the preceding formula as 


sin t ad t — t 
= = | [Leos sas } | [Leos aie J - 
k=1 k=1 


Prove that either factor on the nght is the ch.f. of a singular distribution. Thus 
the convolution of two such may be absolutely continuous. [HINT: Use the 
same r.v.’s as for the Cantor distribution in Exercise 9 of Sec. 5.3.] 

10. Using the strong law of large numbers, prove that the convolution of 
two Cantor d.f.’s is still singular. [HmnT: Inspect the frequency of the digits in 
the sum of the corresponding random series; see Exercise 9 of Sec. 5.3.] 

*11. Let F,, G, be the d.f.’s of 4p, vn, and f,, g, their ch.f.’s. Even if 
SUP, cg! |Fn(x) — Gn(x)| > 0, it does not follow that (fn, 8n)2 > 0; indeed 
it may happen that (f,,, g,)2 = 1 for every n. [HintT: Take two step functions 
“out of phase”’.] 

12. In the notation of Exercise 11, even if sup,cg | fin (t) — gn(t)| 9, 
it does not follow that (F,,G,) — 0; indeed it may — 1. [HinT: Let f be any 
ch.f. vanishing outside (—1, 1), f;(t) = e "i! f(m;t), g(t) = e'"i' F (mjt), and 
F ;, G; be the corresponding d_f.’s. Note that if mjn;' — 0, then F (x) > 1, 
Gj(x) — 0 for every x, and that f; — g; vanishes outside (—mj", mj") and 
is O(sinn jt) near t= 0. If mj; =2/ and nj = jm; then > ,(fj — gj) is 
uniformly bounded in t: for ae <t< nj! consider j > k, j =k, j < k sepa- 
rately. Let 


then sup |f; — gi |= O(n~') while F* — G* — 0. This example is due to 
Katznelson, rivaling an older one due to Dyson, which is as follows. For 
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x +h? 
; 08 we ; 
F(x) = eemege aoe 
alog (2) 
a 


for x < 0 and = 1 for x > 0; G(x) = 1 — F(—x). Then, 


b>a>0, let 


t [e7altl = ePltly 
It] b 
we (=) 


If a is large, then (F, G) is near 1. If b/a is large, then (f, g) is near 0.] 


f@-g® =—-nzi 


6.4 Simple applications 


A common type of application of Theorem 6.3.2 depends on power-series 
expansions of the ch.f., for which we need its derivatives. 


Theorem 6.4.1. If the df. has a finite absolute moment of positive integral 
order k, then its ch.f. has a bounded continuous derivative of order k given by 


CO 
(1) Ty = i (ix)ke'™ dF (x). 
—oo 
Conversely, if f has a finite derivative of even order k at t = 0, then F has 


a finite moment of order k. 


PROOF. For k = 1, the first assertion follows from the formula: 


Oe 


dF (x). 
hy be i (x) 


An elementary inequality already used in the proof of Theorem 6.2.1 shows 
that the integrand above is dominated by |x|. Hence if i \x|d F(x) < 6, we 
may let h — O under the integral sign and obtain (1). Uniform continuity of 
the integral as a function of ¢ is proved as in (ii) of Sec. 6.1. The case of a 
general k follows easily by induction. 

To prove the second assertion, let k = 2 and suppose that f”(0) exists 
and is finite. We have 


f(b) = 2f 0) + f(A) 


wt _ ft 
POS ie 
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eihx _ 2 + ew thx 


= jin | pear 
1 — cos hx 

2 = —?2]i oe 

(2) 2 lim iB dF (x). 


As h — 0, we have by Fatou’s lemma, 


5 _ l—-coshx 1 — cos hx 
“dF(x) = : - 
J: dF (x) 2 | im ar dF (x) < lim 2 [ ap dF (x) 


_ — f"(0). 


Thus F has a finite second moment, and the validity of (1) for k = 2 now 
follows from the first assertion of the theorem. 

The general case can again be reduced to this by induction, as follows. 
Suppose the second assertion of the theorem is true for 2k — 2, and that 
f (0) is finite. Then f@*~?)(t) exists and is continuous in the neighborhood 
of t = 0, and by the induction hypothesis we have in particular 


(—1)7! [er dF (x) = f%- (0). 


Put G(x) = [°, y*-* dF(y) for every x, then G(-)/G(0o) is a df. with the 


ch.f. bt ak 
__ i itx 2k—2 ee Got Oe eng C2, 
W(t) = Goo) Je x dF(x)= ——"Gleo) . 


Hence w” exists, and by the case k = 2 proved above, we have 


] 1 
_ 2, — —— | x2 ee ee 
w<'(0) Gi | dG(x) Gi , | dF (x) 


Upon cancelling G(oo), we obtain 


(170) = | Pao, 


which proves the finiteness of the 2kth moment. The argument above fails 
if G(oc) = 0, but then we have (why?) F = 69, f = 1, and the theorem 1S 
trivial. 

Although the next theorem is an immediate corollary to the preceding 
one, it is so important as to deserve prominent mention. 


Theorem 6.4.2. If F has a finite absolute moment of order k, k an integer 
>1, then f has the following expansion in the neighborhood of t = 0: 


kaj _ 
3) f(t) = So mr! + orl), 


j=0 J 
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3’) ro=t Fmiel + y®e 


j=0 J 


where m) is the moment of order j, 2“) is the absolute moment of order k, 
and |@| < 1. 


PROOF. According to a theorem in calculus (see, e.g., Hardy [1], p. 290}), 
if f has a finite kth derivative at the point t = 0, then the Taylor expansion 
below is valid: 


ky. 
4) fo= soz i 7 + o( tel"). 


j=0 
If f has a finite kth derivative in the neighborhood of t = 0, then 


0 (k) 
(4) fo=L Ee On! +e, al <1 


Since the absolute moment of order j is finite for 1 < j <k, and 
FOO =m, [FOC <u 
from (1), we obtain (3) from (4), and (3’) from (4’). 


It should be remarked that the form of Taylor expansion given in (4) 
is not always given in textbooks as cited above, but rather under stronger 
assumptions, such as “f has a finite kth derivative in the neighborhood of 
0”. [For even k this stronger condition is actually implied by the weaker one 
stated in the proof above, owing to Theorem 6.4.1.] The reader is advised 
to learn the sharper result in calculus, which incidentally also yields a quick 
proof of the first equation in (2). Observe that (3) implies (3’) if the last term 
in (3’) is replaced by the more ambiguous O({f|*), but not as it stands, since 
the constant in “O” may depend on the function f and not just on u™. 

By way of illustrating the power of the method of ch.f.’s without the 
encumbrance of technicalities, although anticipating more elaborate develop- 
ments in the next chapter, we shall apply at once the results above to prove two 
classical limit theorems: the weak law of large numbers (cf. Theorem 5.2.2), 
and the central limit theorem in the identically distributed and finite variance 
case. We begin with an elementary lemma from calculus, stated here for the 
sake of clarity. 


Lemma. If the complex numbers c,, have the limit c, then 
. Cnr\" 
(5) lim (1 + =") = 
n->0O0 n 


(For real c,,’s this remains valid for c = +0.) 
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Now let {X,,, > 1} be a sequence of independent r.v.’s with the common 
d.f. F, and S, = }0_, Xj, as in Chapter 5. 


Theorem 6.4.3. If F has a finite mean m, then 
— +m in pr. 
n 


PROOF. Since convergence to the constant m is equivalent to that in dist. 
to 5, (Exercise 4 of Sec. 4.4), it is sufficient by Theorem 6.3.2 to prove that 
the ch.f. of S,/n converges to e!” (which is continuous). Now, by (2) of 
Sec. 6.1 we have 


E(eitSnl™y — B(eit/mSny — E 6le 
n 


By Theorem 6.4.2, the last term above may be written as 


(emt se(’)) 


for fixed t and n —> oo. It follows from (5) that this converges to e!” as 
desired. 
Theorem 6.4.4. If F has mean m and finite variance o? > 0, then 
Sn —mn 
a/n 


where ® is the normal distribution with mean O and variance 1. 


— ® in dist. 


PROOF. We may suppose m = 0 by considering the r.v.’s X ; — m, whose 
second moment is o*. As in the preceding proof, we have 


(00 (te) =1(ca) 
w {1428 (ste) +0( ate) 
— {1 = = +o (=) \ ae: 


The limit being the ch.f. of ®, the proof is ended. 


The convergence theorem for ch.f.’s may be used to complete the method 
of moments in Theorem 4.5.5, yielding a result not very far from what ensues 
from that theorem coupled with Carleman’s condition mentioned there. 
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Theorem 6.4.5. In the notation of Theorem 4.5.5, if (8) there holds together 
with the following condition: 
mk 


(6) Vt eR: lim = 0, 
k>00 =k! 


then F,->F. 
PROOF. Let f, be the ch.f. of F,,. For fixed tf and an odd k we have by 
the Taylor expansion for e’* with a remainder term: 


lez 


k : ; : 
fat) = fe arate) = [ SS" 0} ar.) 


oe 


Pict tne 


->¢ (it) 
7 ! Ee il 


where 6 denotes a “generic” complex number of modulus <1, not necessarily 
the same in different appearances. (The use of such a symbol, to be further 
indulged in the next chapter, avoids the necessity of transposing terms and 
taking moduli of long expressions. It is extremely convenient, but one must 
occasionally watch the dependence of @ on various quantities involved.) It 
follows that 


(7) f He7OS5- 2 WO mD) + okt (mE) 4. EHD) 
n = j! n k+D! n . 


Given € > 0, by condition (6) there exists an odd k = k(e) such that for the 
fixed t we have 
Ime) 4 jt! 
(K+ 1)! 2 
Since we have fixed k, there exists np = no(€) such that if n > no, then 
: m+) < m&t) Ae 1, 


and moreover, 


. yy € 
max im? — mD| < ele 
1<j< 


Then the right side of (7) will not exceed in modulus: 


ja ri 9 (k+ 1)! 
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Hence f,(t) > f(t) for each t, and since f is a ch-f., the hypotheses of 
Theorem 6.3.2 are satisfied and so F,—>F. 


As another kind of application of the convergence theorem in which a 
limiting process is implicit rather than explicit, let us prove the following 
characterization of the normal distribution. 


Theorem 6.4.6. Let X and Y be independent, identically distributed r.v.’s 
with mean 0 and variance 1. If X + Y and X — Y are independent then the 
common distribution of X and Y is ®. 


PRooF. Let f be the ch.f., then by (1), f’(0) = 0, f”(O) = —1. The ch-f. 
of X + Y is f(t)* and that of X — Y is Ff (t)f (—t). Since these two r.v.’s are 
independent, the ch.f. f(2r) of their sum 2X must satisfy the following relation: 


(9) f(2t) = fOr f(t). 
It follows from (9) that the function f never vanishes. For if it did at fo, then 


it would also at t9/2, and so by induction at t9/2” for every n > 1. This is 
impossible, since limy—oo f (t9/2”) = f(0) = 1. Setting for every fr: 


_ ft) 
pt) = Fn’ 

we obtain 
(10) p(2t) = p(t)’. 


Hence we have by iteration, for each t: 


t 2” t 2” 
wre () feel) >) 


by Theorem 6.4.2 and the previous lemma. Thus p(t) = 1, f(t) = f(—?), and 
(9) becomes 


(11) f (2t) = f(r. 


Repeating the argument above with (11), we have 


4” 2 2 4" ; 
wnrs) boa) 


This proves the theorem without the use of “logarithms” (see Sec. 7.6). 


EXERCISES 
*1. If f is the chf. of X, and 
_ f@-1_ -X 
lim = 


ro t2 2 


> —O, 
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then <(X) =0 and &(X*) =o”. In particular, if f(t) = 1+ 0(17) as t > 0, 
then f = 1. 

*2. Let {X,} be independent, identically distributed with mean 0 and vari- 
ance o*, 0 < 0” < oo. Prove that 


am (St!) <9 wm (52) —. 72 
nose Jn ~~ nose © Jn ~ _ 


{If we assume only P{X; #0} > 0, &(|X)|) < oo and &(X,) = 0, then we 
have &(|S,|) => C./n for some constant C and all n; this is known as 
Hornich’s inequality.] [Hmvr: In case o7 = 06, if lim, &(| S,/./n) < 00, then 
there exists {n;,} such that S,/,/n, converges in distribution; use an extension 
of Exercise 1 to show | f (t/./n)|*" — 0. This is due to P. Matthews. ] 

3. Let A(X =k} = pe, 1 <k <£ < 00, Sy_) px = 1. The sum S,, of 
n independent r.v.’s having the same distribution as X is said to have a 
multinomial distribution. Define it explicitly. Prove that [S, — &(S,)]/o(S,) 
converges to ® in distribution as n — ov, provided that o(X) > 0. 


*4. Let X, have the binomial distribution with parameter (7, p,), and 
suppose that n p, — A > 0. Prove that X,, converges in dist. to the Poisson d.f. 
with parameter 4. (In the old days this was called the law of small numbers.) 


5. Let X, have the Poisson distribution with parameter A. Prove that 
[X, —A]/a!/* converges in dist. to ® as A > oo. 


*6. Prove that in Theorem 6.4.4, S,/o./n does not converge in proba- 
bility. [HmvT: Consider S,/o./n and So, /oV2n.]} 
7. Let f be the ch.f. of the d.f. F. Suppose that as t > 0, 


f@-1=o0(t\"), 


where 0 < a < 2, then as A > ox, 


/ dF (x) = O(A“). 
Ix|>A 


[HINT: Integrate Sisal — cos tx)dF (x) < Ct® over t in (0, A).] 

8. If0 <a@ < 1 and f |x|“ dF (x) < oo, then f(t) — 1 = o(|t|*) ast — 0. 
For 1 < a <2 the same result is true under the additional assumption that 
[xdF(x) = 0. [HInT: The case 1 < @ <2 is harder. Consider the real and 
imaginary parts of f(t) — 1 separately and write the latter as 


/ sin tx dF (x) +f sin tx dF (x). 
Ix|<e/t 


Ix|>«/lt| 
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The second is bounded by (|t|/«e)® Leven |x|* dF (x) = o(|t|") for fixed e. In 
the first integral use sin tx = tx + O(|tx]°), 


J txdF (x) = | xdF (x), 
lx|<e/Ir| [xi>e/|t| 


[o@) 
/ |tx]? dF(x) < en | \tx|* dF (x).] 
Ix|se/|e| ore 


9. Suppose that e~*!"", where c > 0,0 < @ < 2, is ach-f. (Theorem 6.5.4 
below). Let {X;, 7 = 1} be independent and identically distributed r.v.’s with 
a common ch.f. of the form 


1 — Bit|* + o({t|) 


as t —> 0. Determine the constants b and @ so that the ch.f. of S, /bn® converges 
to el, 

10. Suppose F satisfies the condition that for every n > O such that as 
A> %, 


/ dF (x) = O(e-™). 
|x|>A 


Then all moments of F are finite, and condition (6) in Theorem 6.4.5 is satisfied. 
11. Let X and Y be independent with the common d.f. F of mean 0 and 
variance 1. Suppose that (X + Y)/4/2 also has the df. F. Then F = ©. [Hnt: 
Imitate Theorem 6.4.5.] 
*12. Let {X;, 7 => 1} be independent, identically distributed r.v.’s with 
mean 0 and variance 1. Prove that both 


converge in dist. to ®. [Hint: Use the law of large numbers.] 

13. The converse part of Theorem 6.4.1 is false for an odd k. Example. 
F is a discrete symmetric d.f. with mass C/n* log n for integers n > 3, where 
O is the appropriate constant, and &k = |. [HINT: It is well known that the series 


SS sinnt 
nlogn 


n 


converges uniformly in f.] 
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We end this section with a discussion of a special class of distributions 
and their ch.f.’s that is of both theoretical and practical importance. 

A distribution or p.m. on #! is said to be of the lattice type iff its 
support is in an arithmetical progression— that is, it is completely atomic 
(i.e., discrete) with all the atoms located at points of the form {a + jd}, where 
a is real, d > 0, and j ranges over a certain nonempty set of integers. The 
corresponding lattice df. is of form: 


Fx) = S> pjSa+ja(x), 


j=-o0 


where p; > 0 and aa pj = 1. Its chf. is 


[o.4) 
(12) f(t) a eat » pjei@", 
j=-oo 
which is an absolutely convergent Fourier series. Note that the degenerate d.f. 
6, with ch.f. e®” is a particular case. We have the following characterization. 
Theorem 6.4.7. A ch-f. is that of a lattice distribution if and only if there 
exists a fo % O such that | f(to)| = 1. 


PROOF. The “only if” part is trivial, since it is obvious from (12) that | f| 
is periodic of period 27/d. To prove the “if” part, let f (to) = e’, where 0% 
is real; then we have 


1 =e f(t) = / elltor-8) 14 (dx) 


and consequently, taking real parts and transposing: 


(13) 0= je — cos(tox — Oo)]u(dx). 


The integrand is positive everywhere and vanishes if and only if for some 
integer J, 
66 . (=) 
RS oe eke 
to to 


It follows that the support of . must be contained in the set of x of this form 
in order that equation (13) may hold, for the integral of a strictly positive 
function over a set of strictly positive measure is strictly positive. The theorem 
is therefore proved, with a = 69/to and d = 27/tg in the definition of a lattice 
distribution. 
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It should be observed that neither “a” nor “d” is uniquely determined 
above; for if we take, e.g., a’ = a+d’ and d’ a divisor of d, then the support 
of yw is also contained in the arithmetical progression {a’ + jd’}. However, 
unless yz is degenerate, there is a unique maximum d, which is called the 
“span”’ of the lattice distribution. The following corollary is easy. 


Corollary. Unless |f| = 1, there is a smallest tg > O such that f(t) = 1. 
The span is then 277/to. 


Of particular interest is the “integer lattice” when a = 0, d = 1; that is, 
when the support of yz is a set of integers at least two of which differ by 1. We 
have seen many examples of these. The ch.f. f of such an r.v. X has period 
2x, and the following simple inversion formula holds, for each integer /: 


] sd . 
(14) AK =j)=pj=— f@e~" dt, 
20 J_x 


where the range of integration may be replaced by any interval of length 
2x. This, of course, is nothing but the well-known formula for the “Fourier 
coefficients” of f. If {X;,} is a sequence of independent, identically distributed 
r.v.’s with the ch.f. f, then S, = )~" X, has the ch.f. (f)", and the inversion 
formula above yields: 


1 a 7 
(15) ASn = j= / Le (oye at. 


This may be used to advantage to obtain estimates for S, (see Exercises 24 
to 26 below). 


EXERCISES 


f or fp, 18 a ch.f. below. 

14. If | f(4)| = 1, |f(@’)| = 1 and ¢/r’ is an irrational number, then /f is 
degenerate. If for a sequence {t,} of nonvanishing constants tending to 0 we 
have | f (t,)| = 1, then f is degenerate. 

*15. If |f,(t)| > 1 for every tf as n — oo, and F,, is the df. corre- 
sponding to f,,, then there exist constants a, such that F, (x + Qn )—> do. (HINT: 
Symmetrize and take a, to be a median of F,,.] 

*16. Suppose b, > 0 and |f(b,t)| converges everywhere to a ch.f. that 
is not identically 1, then b, converges to a finite and strictly positive limit. 
{Hint: Show that it is impossible that a subsequence of b, converges to O or 
to +00, or that two subsequences converge to different finite limits.] 

*17. Suppose c, is real and that e’ converges to a limit for every f 
in a set of strictly positive Lebesgue measure. Then c, converges to a finite 
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limit. [HINT: Proceed as in Exercise 16, and integrate over t. Beware of any 
argument using “logarithms”, as given in some textbooks, but see Exercise 12 
of Sec. 7.6 later.] 


*18. Let f and g be two nondegenerate ch.f.’s. Suppose that there exist 
real constants a, and b, > 0 such that for every f: 


fr) f@) and eltn/>n Ff, @. = R(T). 


Then a, — a, by — b, where a is finite, 0 < b < 00, and g(t) = e'"4/ f (t/b). 
[HINT: Use Exercises 16 and 17.] 


*19. Reformulate Exercise 18 in terms of d.f.’s and deduce the following 
consequence. Let F,, be a sequence of d.f.’s ay, a’, real constants, b, > 0, 
bi > 0. If 


F,(bnX + a,)—> F(x) and F, (bx +a.) F(a), 


where F is a nondegenerate d_.f., then 


irre 
bi by 
[Two d.f.’s F and G such that G(x) = F(bx + a) for every x, where b > 0 
and a is real, are said to be of the same “type”. The two preceding exercises 
deal with the convergence of types.] 

20. Show by using (14) that | cost] is not a ch.f. Thus the modulus of a 
ch.f. need not be a ch-f., although the squared modulus always is. 

21. The span of an integer lattice distribution is the greatest common 
divisor of the set of all differences between points of jump. 

22. Let f(s, t) be the ch.f. of a 2-dimensional p.m. v. If | f (so, fo)| = 1 
for some (so, fo) 4 (0, 0), what can one say about the support of v? 

*23. If {X,} is a sequence of independent and identically distributed r.v.’s, 
then there does not exist a sequence of constants {c,} such that )>, (Xn — cn) 
converges a.e., unless the common d.f. is degenerate. 

In Exercises 24 to 26, let S, = i X ;, where the xis are independent r.v.’s 
with a common d.f. F of the integer lattice type with span 1, and taking both 
>0 and <0 values. 

*24. If fxdF(x) =0, [ x? dF(x) =o”, then for each integer j: 


ni? PAS, = j}> =e 
oV 20 


[HINT: Proceed as in Theorem 6.4.4, but use (15).] 


re 
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25. If F #0, then there exists a constant A such that for every /: 
PA{Sn = jh < An. 


[HINT: Use a special case of Exercise 27 below.] 
26. If F is symmetric and f |x|dF(x) < 00, then 


nP{Sp, = J} > oo. 


[HInT: 1 — f(t) = o(|t|) as t > 01] 


27. If f is any nondegenerate ch. f, then there exist constants A > 0 
and 6 > 0 such that 


If@|<1—Ar for |r] <6. 


[HINT: Reduce to the case where the d.f. has zero mean and finite variance by 
translating and truncating.] 
28. Let Q, be the concentration function of S, = via j» where the 
X ;’s are independent r.v.’s having a common nondegenerate d.f. F. Then for 
every h > 0, 
On(h) < An7'? 


[HINT: Use Exercise 27 above and Exercise 16 of Sec. 6.1. This result is due 
to Lévy and Doeblin, but the proof is due to Rosén.] 
In Exercises 29 to 35, w or uw, is ap.m. on WY = (0, 1]. 

29. Define for each n: 


funy= feta) 
U 


Prove by Weierstrass’s approximation theorem (by trigonometrical polyno- 
mials) that if f,,(1) = f,..(n) for every n > 1, then ; = t2. The conclusion 
becomes false if @% is replaced by [0, 1]. 

30. Establish the inversion formula expressing yz in terms of the f,,(n)’s. 
Deduce again the uniqueness result in Exercise 29. [HINT: Find the Fourier 
series of the indicator function of an interval contained in 7.] 

31. Prove that |f,,(n)| = 1 if and only if w has its support in the set 
{89 + jn—!,0 < j <n —1} for some 6 in (0, n~'}. 

*32. yu is equidistributed on the set {jn~',0 < j <n — 1} if and only if 
f uj) = 0 or 1 according to j { n or j | n. 
*33. u,—> pw if and only if f,,,(-) > fy(-) everywhere. 

34. Suppose that the space @ is replaced by its closure [0, 1] and the 

two points 0 and 1 are identified; in other words, suppose 7% is regarded as the 
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circumference @) of a circle; then we can improve the result in Exercise 33 
as follows. If there exists a function g on the integers such that Fu.) > gC) 
On then there exists a p.m. yw on such that g = f, and Up [L 
on ; 

35. Random variables defined on @) are also referred to as defined 
“modulo 1”. The theory of addition of independent r.v.’s on is somewhat 
simpler than on &!, as exemplified by the following theorem. Let {Xj;, j > 1} 
be independent and identically distributed r.v.’s on @) and Jet S$; = ee Xj. 
Then there are only two possibilities for the asymptotic distributions of S;. 
Either there exist a constant c and an integer n > 1 such that S,; — kc converges 
in dist. to the equidistribution on {jn-!,0 < j <n — 1}; or S; converges in 
dist. to the uniform distribution on . [HINT: Consider the possible limits of 
(fu(n))‘ as k > 00, for each n.] 


6.5 Representation theorems 


A ch.f. is defined to be the Fourier—Stieltjes transform of a p.m. Can it be char- 
acterized by some other properties? This question arises because frequently 
such a function presents itself in a natural fashion while the underlying measure 
is hidden and has to be recognized. Several answers are known, but the 
following characterization, due to Bochner and Herglotz, is most useful. It 
plays a basic role in harmonic analysis and in the theory of second-order 
stationary processes. 

A complex-valued function f defined on &! is called positive definite iff 
for any finite set of real numbers t; and complex numbers z; (with conjugate 
complex Z;), 1 < j <n, we have 


(1) So SS FG; — tz = 0. 
j=l k=] 
Let us deduce at once some elementary properties of such a function. 
Theorem 6.5.1. If f is positive definite, then for each t € Z': 


f-)D=fO, If) < f). 


If f is continuous at t= 0, then it is uniformly continuous in &!. In 
this case, we have for every continuous complex-valued function ¢ on #! and 
every T > 0: 


1 T T 
(2) i / f(s —t)e(s)E(t) ds dt > 0. 
0 0 
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PROOF. Taking n = 1, t; = 0, z; = 1 in (1), we see that 
f() = 0. 
Taking n = 2, t} = 0, t) =1t, z} = 2 = 1, we have 
2f0) + f+ f(—t = 0; 


changing 22 to i, we have 
FO) + fei — f(—Ai+ FO) = 0. 


Hence f(t) + f(—t) is real and f(t) — f(—1) is pure imaginary, which imply 
that f(r) = f(—r). Changing z, to f(t) and z to —|f (|, we obtain 


2fON FMP —21f~LP = 0. 


Hence f(0) > |f(@)|, whether |f(t)| =O or > 0. Now, if f(0) =0, then 
f(-) = 0; otherwise we may suppose that f(0) = 1. If we then take n = 3, 
ty = 0, t2 =t, t3 =t+h, a well-known result in positive definite quadratic 
forms implies that the determinant below must have a positive value: 


f (0) fet) ftet=h) 
f(t) f() f(-h) 
S@+h) f(A) Ff (0) 


=1-|f@P-If +H? -|f@P + 2R(fOFMFE+A) = 0. 
It follows that 
IFO - fE+A? =IFOP +1F C+ AP - RFOF C+ A)} 
<1-|f A)? +2R{fOF E+ MLS) — 1) 
<1-|f@?P+2)1-f@) <41-f@. 


Thus the “modulus of continuity” at each t is bounded by twice the square 
root of that at 0; and so continuity at 0 implies uniform continuity every- 
where. Finally, the integrand of the double integral in (2) being continuous, 
the integral is the limit of Riemann sums, hence it is positive because these 
sums are by (1). 


Theorem 6.5.2. f is ach/f. if and only if it is positive definite and continuous 
at 0 with (0) = 1. 


Remark. It follows trivially that f is the Fourier—Stieltjes transform 


i e!* y(dx) 
Ri 
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of a finite measure v if and only if it is positive definite and finite continuous 
at 0: then v(R!) = f (0). 


PROOF. If f is the ch.f. of the p.m. yz, then we need only verify that it is 
positive definite. This is immediate, since the left member of (1) is then 


2 
[SV eisFaucan =| So eltz; u(dx) > 0. 
j=l k=! j=l 


Conversely, if f is positive definite and continuous at 0, then by 
Theorem 6.5.1, (2) holds for ¢(t) = e~'*. Thus 


1 7 ft 
(3) aT | i f(s —ne*S-™ ds dt > 0. 
TL JQ JO 


Denote the left member by pr(x); by a change of variable and partial integra- 
tion, we have 


(4) ( j= i 7 _ 7) f(pe dt 
Pr) = 20 [. ( T ; 


Now observe that for a > 0, 


2 fap [et dca > [AOE ap = set 
Qa Jo —p 0 


a t at 


(where at t = O the limit value is meant, similarly later); it follows that 


1 f B _1 ft [t| 1 — cosat 
=-[" frit). CSO at 
HE J—oco 
1° t\ 1—cost 
“1 fn (st 
where 
|t| . 
1—- — ; f |t) <7; 
(5) fri) = ( 7) FO a 


0, if |t| > T. 


Note that this is the product of f by a ch-f. (see Exercise 2 of Sec. 6.2) 
and corresponds to the smoothing of the would-be density which does not 
necessarily exist. 
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Since |fr(t)| < |f(.)| <1 by Theorem 6.5.1, and (1 —cost)/t? belongs to 
L!(—00, 00), we have by dominated convergence: 


. I f* B 1 t\ 1—cost 
6) lim—]| d = — Py 2 cost 
©) aoc a ef, Pra) ae Ls [. nts fr (5) ? a 


Here the second equation follows from the continuity of fr at O with 
fr(O) = 1, and the third equation from formula (3) of Sec. 6.2. Since pr > 0, 
the integral f 8 , pr(x) dx is increasing in B, hence the existence of the limit of 
its “integral average” in (6) implies that of the plain limit as 8 — oo, namely: 


a) B 
(7) / Pr(x)dx = jim / pri(x)dx = 1. 
CO cade, ©) —B 


Therefore pr is a probability density function. Returning to (3) and observing 


that for real t: 
[ el % Wx dx = 2 sin B(t — t) 
—B t—t , 


we obtain, similarly to (4): 


a Bb oo _ —_ 
0 —B TJ —oo _ 


a a(t — t) 


1 ¢* t\ 1—cost 
= — fr{t-——})—;— dt. 
Note that the last two integrals are in reality over finite intervals. Letting 
a —> oo, we obtain by bounded convergence as before: 


(8) / el pr(x)dx = fr(o), 


—0O 


the integral on the left existing by (7). Since equation (8) is valid for each r, 
we have proved that fr is the ch.f. of the density function pr. Finally, since 
fr(t) > f(t) as T > o¢ for every t, and f is by hypothesis continuous at 
t = 0, Theorem 6.3.2 yields the desired conclusion that f is a ch.f. 


As a typical application of the preceding theorem, consider a family of 
(real-valued) r.v.’s {X,,t € Z,}, where 7, = [0, 00), satisfying the following 
conditions, for every s and ¢ in “Ay: 
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(i) €(X?) =1; 
(ii) there exists a function r(-) on &! such that &(X,X,) = r(s — 1); 
(iii) lim,o & (Xo — X,)2) = 0. 


A family satisfying (i) and (ii) is called a second-order stationary process or 
stationary process in the wide sense; and condition (iii) means that the process 
is continuous in L?(Q,F, P). 

For every finite set of t; and z; as in the definitions of positive definite- 
ness, we have 


2 


OSE) OX Zi] P= DD EM Xu zie = > Dry — wake. 


j=l j=l k=l j=l k=l 
Thus r is a positive definite function. Next, we have 
r(Q) — r(t) = &(Xo(XKo — X1)), 
hence by the Cauchy—Schwarz inequality, 
Ir(t) — rO)P < E(K)E(Xo — X1”). 


It follows that r is continuous at 0, with r(0) = 1. Hence, by Theorem 6.5.2, 
r is the ch.f. of a uniquely determined p.m. R: 


r(t) = 7 e'™ R(dx). 
Rl 


This R is called the spectral distribution of the process and is essential in its 
further analysis. 

Theorem 6.5.2 is not practical in recognizing special ch.f.’s or in 
constructing them. In contrast, there is a sufficient condition due to Pélya 
that is very easy to apply. 


Theorem 6.5.3. Let f on &! satisfy the following conditions, for each t: 


(9) fO=1, fM20, fMO=fCd, 
f is decreasing and continuous convex in Ay = [0, oo). Then f is a ch.f. 


PROOF. Without loss of generality we may suppose that 
f (oo) = lim f(t) = 9; 
[> 0O 


otherwise we consider [ f(t) ~ f(00)]/[f (0) — f(co)] unless f(co) = 1, in 
which case f = 1. It is well known that a convex function f has right-hand 
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and left-hand derivatives everywhere that are equal except on a countable set, 
that f is the integral of either one of them, say the right-hand one, which will 
be denoted simply by f’, and that f’ is increasing. Under the conditions of 
the theorem it is also clear that f is decreasing and f’ is negative in A,. 
Now consider the f7 as defined in (5) above, and observe that 


oe 1 . 
sine {-(1-7) fOr gre ifO0<t<T; 
0, ift>T. 


Thus —f is positive and decreasing in 2,. We have for each x 4 0: 


a er Fe@) dt = 2 f° cos xf r(t) dt= =f sin tx(— f(t) dt 
0 0 


co 
9) Sm (k+1)a/x 
=-S> i sin tx(— f'-(t)) dt. 


The terms of the series alternate in sign, beginning with a positive one, and 
decrease in magnitude, hence the sum > 0. [This is the argument indicated 
for formula (1) of Sec. 6.2.] For x = 0, it is trivial that 


i. fr(t)dt = 0. 


We have therefore proved that the pr defined in (4) is positive everywhere, 
and the proof there shows that f is a ch.f. (cf. Exercise 1 below). 


Next we will establish an interesting class of ch.f.’s which are natural 
extensions of the ch.f.’s corresponding to the normal and the Cauchy distri- 
butions. 


Theorem 6.5.4. For each a in the range (0, 2], 
f alt) = eu 


is a chlf. 


proor. For 0 < a < 1, this is a quick consequence of Pdlya’s theorem 

above. Other conditions there being obviously satisfied, we need only check 

that fg is convex in [0, co). This is true because its second derivative is 
equal to 
et tat??? Sala = 1) 7} = 0 

for the range of a in question. No such luck for 1 < @ < 2, and there are 

several different proofs in this case. Here is the one given by Lévy which 
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works for 0 < a < 2. Consider the density function 


a . 
0 if |x| < 1; 


and compute its ch.f. f as follows, using symmetry: 


© 1 —costx 


1-fo= | (le) pisydx =a | —_—_—— dx 
—0o 1 


zat! 


°° 1—cosu ‘1 —cosu 
= a|t|* / et au — [ —_——— du}, 
0 uet 0 uetl 
after the change of variables tx = u. Since 1 — cosu ~ su? near u = 0, the first 


integral in the last member above is finite while the second is asymptotically 
equivalent to 


1 ft wv 1 
, [ dy = 
2 Jo uet! 2(2 — a) 
as t | 0. Therefore we obtain 
FQ) =1-eQlt* + OC) 


where Cy iS a positive constant depending on a. 
It now follows that 


t \" Cy|t|% ?? " 
(sre) =[1- 9 +0(sae)f 


is also a ch.f. (What is the probabilistic meaning in terms of r.v.’s?) For each 
t, aS n —> 00, the limit is equal to e~"""" (the lemma in Sec. 6.4 again!). This, 
being continuous at t = 0, is also a ch.f. by the basic Theorem 6.3.2, and the 
constant cy may be absorbed by a change of scale. Finally, for a = 2, fy is 
the ch.f. of a normal distribution. This completes the proof of the theorem. 


Actually Lévy, who discovered these ch.f.’s around 1923, proved also 
that there are complex constants yy such that e~”'"l" is a ch.f., and determined 
the exact form of these constants (see Gnedenko and Kolmogorov [12]). The 
corresponding d.f.’s are called stable distributions, and those with real posi- 
tive yy the symmetric stable ones. The parameter q@ is called the exponent. 
These distributions are part of a much larger class called the infinitely divisible 
distributions to be discussed in Chapter 7. 

Using the Cauchy ch.f. e~'"! we can settle a question of historical interest. 
Draw the graph of this function, choose an arbitrary T > 0, and draw the 
tangents at +7 meeting the abscissa axis at +7’, where 7’ > T. Now define 
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the function fr to be f in [—7, 7], linear in [—7’, —T] and in [T,T’], and 
zero in (—oo, —7’) and (7’, oo). Clearly f7 also satisfies the conditions of 
Theorem 6.5.3 and so is a ch.f. Furthermore, f = fr in [—7,T]. We have 
thus established the following theorem and shown indeed how abundant the 
desired examples are. 


Theorem 6.5.5. There exist two distinct ch.f.’s that coincide in an interval 
containing the origin. 


That the interval of coincidence can be “arbitrarily large” is, of course, 
trivial, for if f; = f2 in [—6, 6], then g; = go in [—nd, nd], where 


ai)= fi (<) » g2t)= fo (<) : 
n n 


Corollary. There exist three ch-f.’s, f1, f2, f3, such that f; f3 = fof but 
fi F fa. 


To see this, take f; and f2 as in the theorem, and take f3 to be any 
ch.f. vanishing outside their interval of coincidence, such as the one described 
above for a sufficiently small value of T. This result shows that in the algebra 
of ch.f.’s, the cancellation law does not hold. 

We end this section by another special but useful way of constructing 
ch.f.’s. 


Theorem 6.5.6. If f is ach/f., then so is e*“f- for each A > 0. 


PROOF. For each A > 0, as soon as the integer n > A, the function 


Xr Xr — 
ae te ha) 
n nl n 


is a ch.f., hence so is its nth power; see propositions (iv) and (v) of Sec. 6.1. 


Asn —> ©, . 
(1 f° a Ceshe: ) > bY) 
n 


and the limit is clearly continuous. Hence it is a ch.f. by Theorem 6.3.2. 
Later we shall see that this class of ch.f.’s is also part of the infinitely 
divisible family. For f(t) = e’’, the corresponding 


CO dyn 
spit _ e Xr - 
er (é 1) aes } ein 
nN: 
n=0 


is the ch.f. of the Poisson distribution which should be familiar to the reader. 
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EXERCISES 


1. If f is continuous in %! and satisfies (3) for each x € #! and each 
T > 0, then f is positive definite. 


2. Show that the following functions are ch.f.’s: 


ee es or a ace 
1+ (| 12); fist Of@2% 
1—|t, if O< [el < 4; 
ro=4 1 : 1 
aoe f |t}>. 
Alr|’ if |t] > 5 


3. If {X,} are independent r.v.’s with the same stable distribution of 
exponent a, then -7_, X;,/n'/ has the same distribution. [This is the origin 
of the name “stable’’.] 


4. If F is a symmetric stable distribution of exponent a, 0 < a < 2, then 
fo xl’ dF (x) < 00 for r<a and = 00 for r >a. [HINT: Use Exercises 7 
and 8 of Sec. 6.4.] 


*5. Another proof of Theorem 6.5.3 is as follows. Show that 


[ aro= 1 
0 


and define the d.f. G on Z, by 


Gu) = | tdf'(t). 
[0,u] 


[ (1 = =) dG(u) = f (t). 
0 u 


flu, t)= (.-") V0 
Uu 


(see Exercise 2 of Sec. 6.2), then 


Next show that 


Hence if we set 


FOE : FGECw 
(0,00) 


Now apply Exercise 2 of Sec. 6.1. 

6. Show that there is a ch.f. that has period 2m, m an integer > 1, and 
that is equal to 1 — |t| in [-1, +1]. [Hinr: Compute the Fourier series of such 
a function to show that the coefficients are positive. ] 
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7. Construct a ch.f. that vanishes in [—b, —a] and [a, b], where 0 <a < 
b, but nowhere else. [HinT: Let f,, be the ch.f. in Exercise 6 and consider 


Sas a where Pm = 0, S> Pm = 1; 
m m 


and the p,,’s are strategically chosen.] 


8. Suppose f (7, uv) is a function on X? such that for each u, f(-, uv) is a 
ch.f.; and for each t, f(f, -) is continuous. Then for any d.f. G, 


exp [ [f(t,u)— acu} 


is a ch. 


9. Show that in Theorem 6.3.2, the hypothesis (a) cannot be relaxed to 
require convergence of {f,} only in a finite interval |t) < 7. 


6.6 Multidimensional case; Laplace transforms 


We will discuss very briefly the ch.f. of a p.m. in Euclidean space of more than 
one dimension, which we will take to be two since all extensions are straight- 
forward. The ch.f. of the random vector (X, Y) or of its 2-dimensional p.m. 
ut is defined to be the function f(-,-) on #: 


GQ) fio = feary(s.t) = &el*t?) = i i el +) (dx, dy). 
HK? 


Propositions (i) to (v) of Sec. 6.1 have their obvious analogues. The inversion 
formula may be formulated as follows. Call an “interval” (rectangle) 


{@, y)ixy SX S92, 1 Sy < yy} 


an interval of continuity iff the 4-measure of its boundary (the set of points 
on its four sides) is zero. For such an interval 7, we have 


] T T ,—isx, _. e18x2 ety! a ew ity2 
w(t) = lim i] i) eee ee eae 
Toc (27)? J_r Jer is it 


The proof is entirely similar to that in one dimension. It follows, as there, that 
f uniquely determines jz. Only the following result is noteworthy. 


Theorem 6.6.1. Twor.v.’s X and Y are independent if and only if 


(2) Ys, Vt: foxy 0 = fxi)fy@, 


where fy and fy are the ch.f.’s of X and Y, respectively. 
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The condition (2) is to be contrasted with the following identify in one 
variable: 


Vt: fxay() = fxOfrO, 


where fx+y is the ch.f. of X + Y. This holds if X and Y are independent, but 
the converse is false (see Exercise 14 of Sec. 6.1). 


___ PROOF OF THEOREM 6.6.1. If X and Y are independent, then so are e'™ and 
e'Y for every s and t, hence 


Ferre) = fe . ell¥) = &(e'* yee”), 


which is (2). Conversely, consider the 2-dimensional product measure ; x (2, 
where j; and 4/2 are the 1-dimensional p.m.’s of X and Y, respectively, and 
the product is defined in Sec. 3.3. Its ch.f. is given by definition as 


// el SH) (1, x [2)(dx, dy) = a el - e's (dx) ur(dy) 
R me 


= i, eu (dx) i e u(dy) = fx fr 
B Fi! 


(Fubini’s theorem!). If (2) is true, then this is the same as f (x,y)(s, ¢), so that 
[41 X [42 has the same ch-f. as yz, the p.m. of (X, Y). Hence, by the uniqueness 
theorem mentioned above, we have 4; X 42 = mw. This is equivalent to the 
independence of X and Y. 


The multidimensional analogues of the convergence theorem, and some 
of its applications such as the weak law of large numbers and central limit 
theorem, are all valid without new difficulties. Even the characterization of 
Bochner has an easy extension. However, these topics are better pursued with 
certain objectives in view, such as mathematical statistics, Gaussian processes, 
and spectral analysis, that are beyond the scope of this book, and so we shall 
not enter into them. 

We shall, however, give a short introduction to an allied notion, that of 
the Laplace transform, which is sometimes more expedient or appropriate than 
the ch.f., and is a basic tool in the theory of Markov processes. 

Let X be a positive (>0) r.v. having the d.f. F so that F has support in 
[0, 00), namely F(O—) = 0. The Laplace transform of X or F is the function 
F on &, = [0, ov) given by 


(3) FQ) = &(e*) = | e* dF (x). 
[0,00) 


It is obvious (why?) that 


F(0)=limF(A)=1, F(oo)= lim F(A) = F(0). 
ALO A> 00 
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More generally, we can define the Laplace transform of an s.d.f. or a function 
G of bounded variation satisfying certain “growth condition” at infinity. In 
particular, if F is an s.d.f., the Laplace transform of its indefinite integral 


G(x) = [Fwd 
0 


is finite for A > 0 and given by 


G(A) = / e *F(x)dx = | e* dx | (dy) 
[0,00) [0,00) [0,x] 


oe 1 ; 1. 
-| widy) | eM dx = - | e’*yw(dy) = —F(), 
[0,00) y A. J[0,00) r 


where y is the s.p.m. of F. The calculation above, based on Fubini’s theorem, 
replaces a familiar “integration by parts”. However, the reader should beware 
of the latter operation. For instance, according to the usual definition, as 
given in Rudin [1] (and several other standard textbooks!), the value of the 


Riemann—Stieltjes integral 
oO 
| e* d8o(x) 
0 


is 0 rather than 1, but 


CO CO 
[ e* d8o(x) = lim 89 (xje"* 4 + / do(x)Ae ™* dx 
0 . 0 


Atco 


is correct only if the left member is taken in the Lebesgue-Stieltjes sense, as 
is always done in this book. 

There are obvious analogues of propositions (i) to (v) of Sec. 6.1. 
However, the inversion formula requiring complex integration will be omitted 
and the uniqueness theorem, due to Lerch, will be proved by a different method 
(cf. Exercise 12 of Sec. 6.2). 


Theorem 6.6.2. Let F ; be the Laplace transform of the d.f. F ; with support 
in R,, 7 =1,2. If Fy = Fo, then Fy = Fp. 


proor. We shall apply the Stone—Weierstrass theorem to the algebra 
generated by the family of functions {e-**, A > 0}, defined on the closed 
positive real line: ZR = [0,00], namely the one-point compactification of 
R,. ={0, 0). A continuous function of x on A, is one that is continuous 
in ZR, and has a finite limit as x —> oo. This family separates points on 24 
and vanishes at no point of 7, (at the point --oo, the member e~™ = 1 of the 
family does not vanish!). Hence the set of polynomials in the members of the 
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family, namely the algebra generated by it, is dense in the - uniform topology, 
in the space Cg(4) of bounded continuous functions on #4. That is to say, 
given any g € Ca(Ax), and € > 0, there exists a polynomial of the form 


n 
gea)= Si cje*, 
j=l 


where c; are real and i, > 0, such that 
sup |g(x) — ge(x)| < €. 
xER+ 
Consequently, we have 


[isc =@OldFj@S6 f=12 


By hypothesis, we have for each 4 > 0: 
/ ee dFy(x) = , e dF (x), 


and consequently, 


) ges F (x) = / ge(x)d F(x). 


It now follows, as in the proof of Theorem 4.4.1, first that 
[so dF \(x) = [se dF (x) 


for each g € Cg(#,); second, that this also holds for each g that is the 
indicator of an interval in #, (even one of the form (a, 00]); third, that 
the two p.m.’s induced by F; and F» are identical, and finally that F; = F> 
as asserted. 


Remark. Although it is not necessary here, we may also extend the 
domain of definition of a d.f. F to Ry; thus F(oo) = 1, but with the new 
meaning that F(oo) is actually the value of F at the point oo, rather than 
a notation for lim;.../(x) as previously defined. F is thus continuous at 
oo. In terms of the p.m. yz, this means we extend its domain to A, but set 
i4({0o}) = 0. On other occasions it may be useful to assign a strictly positive 
value to p({oo}). 


Passing to the convergence theorem, we shall prove it in a form closest 
to Theorem 6.3.2 and refer to Exercise 4 below for improvements. 


Theorem 6.6.3. Let {F,,, 1 <n < co} bea sequence of s.d-f.’s with supports 
in Z, and {F,,} the corresponding Laplace transforms. Then F,—> Foc, where 
Fy is adf., if and only if: 
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(a) limpseo F n(A) exists for every A > 0; 
(b) the limit function has the limit 1 as A J 0. 


Remark. The limit in (a) exists at A =O and equals 1 if the F,,’s are 
d.f.’s, but even so (b) is not guaranteed, as the example F,, = 6, shows. 


PROOF. The “only if” part follows at once from Theorem 4.4.1 and the 
remark that the Laplace transform of an s.d.f. is a continuous function in 2,. 
Conversely, suppose lim F(A) = G(A), A > 0; extended G to A, by setting 
G(O) = 1 so that G is continuous in A, by hypothesis (b). As in the proof 
of Theorem 6.3.2, consider any vaguely convergent subsequence F’,,,, with the 
vague limit F445, necessarily an s.d.f. (see Sec. 4.3). Since for each A > 0, 
e~-* € Co, Theorem 4.4.1 applies to yield F,,A) > F(A) for A > 0, where 
F) is the Laplace transform of Fo. Thus F(A) = G(A) for A > 0, and 
consequently for A > 0 by continuity of F,. and G at A =0. Hence every 
vague limit has the same Laplace transform, and therefore is the same F'.. by 
Theorem 6.6.2. It follows by Theorem 4.3.4 that F,—>F oo. Finally, we have 
Fo(00) = F,,(0) = GO) = 1, proving that Fp, is a df. 


There is a useful characterization, due to S. Bernstein, of Laplace trans- 
forms of measures on A. A function is called completely monotonic in an 
interval (finite or infinite, of any kind) iff it has derivatives of all orders there 
satisfying the condition: 


(4) (-1)"f™A)>=0 


for each n > 0 and each A in the domain of definition. 


Theorem 6.6.4. A function f on (0, oo) 1s the Laplace transform of a df. F: 
(5) fore [ e*are, 

Rt 
if and only if it is completely monotonic in (0, oo) with f(0+) = 1. 


Remark. We then extend f to A, by setting f(0) =1, and (5) will 
hold for A > 0. 


PROOF. The “only if” part is immediate, since 
{MA)= | (—x)"e"* dF (x). 
Rt 


Turning to the “if” part, let us first prove that f is guasi-analytic in (0, 00), 
namely it has a convergent Taylor series there. Let 0 < Ap < A < yp, then, by 
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Taylor’s theorem, with the remainder term in the integral form, we have 


k—-] (j) 
© — f= Aa — yy 
i 


A- k 
+ oat [ (1 FO M+ Q = wo dt. 
Because of (4), the last term in (6) is positive and does not exceed 
ao Q— wy 


oi [ dp fut Qo — pyar, 

For if k is even, then f® | and (A — x) > 0, while if k is odd then f + 
and (A — 4)‘ < 0. Now, by (6) with A replaced by Ao, the last expression is 
equal to 


A-p\* FOU) . & 
Xz J 
(—*) fo) — so fw Oo — HY] S (oo 


k 
) f (Ao), 
j=0 ag 


where the inequality is trivial, since each term in the sum on the left is positive 
by (4). Therefore, as k — oo, the remainder term in (6) tends to zero and the 
Taylor series for f(A) converges. 

Now for each n > 1, define the discrete s.d.f. F,, by the formula: 


[nx] 


(7) Fix)= So“ o 1yY f(n). 


j=0 / 


This is indeed an s.d-f., since for each € > 0 and k > 1 we have from (6): 


Kl Gj) . 
1=f0+)>seozyZ ee —ny, 
j=0 


Letting € | O and then k ¢ o0, we see that F,, (oo) < 1. The Laplace transform 
of F,, is plainly, for A > 0: 


/ od F(x) = re In my fin) 


ee j= =(0 
=> a eM") — ny fn) = find =e"), 
= 


the last equation from the Taylor series. Letting n — oo, we obtain for the 
limit of the last term f(A), since f is continuous at each i. It follows from 
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Theorem 6.6.3 that {F,,} converges vaguely, say to F, and that the Laplace 
transform of F is f. Hence F(co) = f(0) = 1, and F is a d.f. The theorem 
is proved. 


EXERCISES 


1. If X and Y are independent r.v.’s with normal d-f.’s of the same 
variance, then X + Y and X — Y are independent. 


*2. Two uncorrelated r.v.’s with a joint normal d.f. of arbitrary parameters 
are independent. Extend this to any finite number of r.v.’s. 


*3. Let F and Gbes.d.f.’s. If Ag > 0 and F(A) = G(A) for all A > Ao, then 
F =G. More generally, if F(nao) = G(ndo) for integer n > 1, then F=G. 
[HINT: In order to apply the Stone—Weierstrass theorem as cited, adjoin the 
constant 1 to the family {e~**, A > Ap}; show that a function in Co can actually 
be uniformly approximated without a constant.] 


*4. Let {F,} be s.df.’s. If Ao > O and lim, oof", (A) exists for all A > Ao, 
then {F,,} converges vaguely to an s.d_f. 
5. Use Exercise 3 to prove that for any d.f. whose support is in a finite 
interval, the moment problem is determinate. 
6. Let F be an s.d.f. with support in A,. Define Go = F, 


xX 
G,, (x) =| G,~1(u) du 
0 
for n > 1. Find G, (A) in terms of F(A). 


7. Let fA) = fo e~* f (x)dx where f € L'(0, 00). Suppose that f has 
a finite right-hand derivative f’(0) at the origin, then 


f(0) = lim AF @), 
A> 00 
f'(0) = Jim ALAA) ~ FO). 
*8. In the notation of Exercise 7, prove that for every A, uw € Ay: 
[o,¢) CO A A 
wary fo fo oem fea ndsdt = 70) ~ Fw. 
0 JO 


9. Given a function o on “#, that is finite, continuous, and decreasing 
to zero at infinity, find a o-finite measure 4 on A+ such that 


Vt> o: | o(t — s)u(ds) = 1. 
[0,1] 


[HINT: Assume o(0) = 1 and consider the d.f. 1 —o.] 
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*10. If f > 0 on (0, 00) and has a derivative f’ that is completely mono- 
tonic there, then 1/f is also completely monotonic. 
11. If f 1s completely monotonic in (0, co) with f(0+) = +00, then f 
is the Laplace transform of an infinite measure w on Ax: 


fay= | ey (dx). 
Ry 


[uintT: Show that F,,(x) < e* (5) for each 5 > 0 and all large n, where F,, 
is defined in (7). Alternatively, apply Theorem 6.6.4 to f(A +n7!)/f(n7!) 
for 4 > O and use Exercise 3.] 

12. Let {g,,1 <n <co} on &, satisfy the conditions: () for each 
nN, Zn(-) iS positive and decreasing; (ii) goo(x) is continuous; (ili) for each 
A> 0, 


oO fo @) 
lim, | Xe(dx= | e* g.4(x) dx. 
nO 0 0 


Then 
lim gn(x) = 2oo(x) for every x € Ay. 
nw>eXxX 


[HINT: For € > 0 consider the sequence for eg, (x) dx and show that 


b b 
lim / eg, (x) dx = | e“goo(x)dx, lim gn(b) < g0(b-), 
N--> OO a a nun—-> oO 

and so on.]| 


Formally, the Fourier transform f and the Laplace transform F of a p.m. 
with support in 2, can be obtained from each other by the substitution t = iA 
or A = —it in (1) of Sec. 6.1 or (3) of Sec. 6.6. In practice, a certain expression 
may be derived for one of these transforms that is valid only for the pertinent 
range, and the question arises as to its validity for the other range. Interesting 
cases of this will be encountered in Secs. 8.4 and 8.5. The following theorem 
is generally adequate for the situation. 


Theorem 6.6.5. The function / of the complex variable z given by 


h(z) = | e dF (x) 
R 


Ay 


is analytic in Rz <0 and continuous in Rz < 0. Suppose that g is another 
function of z that is analytic in Rz < 0 and continuous in Rz < 0 such that 


Vt e #':h(it) = g(it). 
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Then A(z) = g(z) in Rz < 0; in particular 
VA € Ry:h(—A) = g(—A). 


PROOF. For each integer m > 1, the function h,, defined by 


and n 
> x 
m= fe dray= oz | ~are) 
[0.m] =o [0m] 


is clearly an entire function of z. We have 


/ ee" dF (x) < / dF (x) 
(m,co) (m,oo) 


in Rz <0; hence the sequence h,, converges uniformly there to h, and h 
is continuous in Rz < 0. It follows from a basic proposition in the theory of 
analytic functions that / is analytic in the interior Rz < 0. Next, the difference 
h — g is analytic in Rz < 0, continuous is Rz < 0, and equal to zero on the 
line Rz = 0 by hypothesis. Hence, by Schwarz’s reflection principle, it can be 
analytically continued across the line and over the whole complex plane. The 
resulting entire functions, being zero on a line, must be identically zero in the 
plane. In particular h — g = 0 in Rz < 0, proving the theorem. 


Bibliographical Note 


For standard references on ch.f.’s, apart from Lévy [7], [11], Cramér [10], Gnedenko 
and Kolmogorov [12], Loéve [14], Rényi [15], Doob [16], we mention: 


S. Bochner, Vorlesungen iiber Fouriersche Integrale. Akademische  Ver- 
laggesellschaft, Konstanz, 1932. 


E. Lukacs, Characteristic functions. Charles Griffin, London, 1960. 
The proof of Theorem 6.6.4 is taken from 


Willy Feller, Completely monotone functions and sequences, Duke J. 5 (1939), 
661-674. 


7 Central limit theorem and 
its ramifications 


7.1 Liapounov’s theorem 


The name “central limit theorem” refers to a result that asserts the convergence 
in dist. of a “normed” sum of r.v.’s, (S, — ay, )/by, to the unit normal d.f. ®. 
We have already proved a neat, though special, version in Theorem 6.4.4. 
Here we begin by generalizing the set-up. If we write 


S,—a "AX; a 
1 n a Aj \ on 
a) D5, | 75, 


j=l 


we see that we are really dealing with a double array, as follows. For each 
n > | let there be k, r.v.’s {Xnj,1 <j <k,}, where k, > co asn —> oo: 


X11,X12,---,X1k3 
(2) X1,X22,---, Xk 


Xni,Xn2; sy Xnkys 
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The r.v.’s with n as first subscript will be referred to as being in the nth row. 
Let F,; be the df, f,; the ch.f. of X,;; and put 


kn 
5 Sin = Xs. 
j=l 


The particular case k, =n for each n yields a triangular array, and if, further- 
more, X,,; = Xj; for every n, then it reduces to the initial sections of a single 
sequence {X;, j = 1}. 

We shall assume from now on that the r.v.’s in each row in (2) are 
independent, but those in different rows may be arbitrarily dependent as in 
the case just mentioned — indeed they may be defined on different probability 
spaces without any relation to one another. Furthermore, let us introduce the 
following notation for moments, whenever they are defined, finite or infinite: 


E(Xnj) = nj, O (Xn) = 0,8 
kn kn 
ESi) = >: Cu = Ons Soe. Ss, 
(3) = = 
kn 
EUXnjP) = Ynjs Ce ae 
j=l 
In the special case of (1), we have 
Xj 2 o? (Xj) 
Xnj= 5 a (Xnj) = om 


If we take b, =s,, then 
kn 

(4) do Xnj) = 1. 
j=l 


By considering X,; — @,j; instead of X,;, we may suppose 
(5) Vn, Vjidnj =9 


whenever the means exist. The reduction (sometimes called “norming’’) 
leading to (4) and (5) is always available if each X,; has a finite second 
moment, and we shall assume this in the following. 

In dealing with the double array (2), it is essential to impose a hypothesis 
that individual terms in the sum 


ky, 
Sn =) Xnj 
j=l 
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are “negligible” in comparison with the sum itself. Historically, this arose 
from the assumption that “small errors” accumulate to cause probabilistically 
predictable random mass phenomena. We shall see later that such a hypothesis 
is indeed necessary in a reasonable criterion for the central limit theorem such 
as Theorem 7.2.1. 

In order to clarify the intuitive notion of the negligibility, let us consider 
the following hierarchy of conditions, each to be satisfied for every € > 0: 


(a) Vi: jim n A{|Xnj| > €e} = 0; 
(b) lim max “{|X,j;| > €} = 0; 
noo 1<j<k,, 
(c) lim a1 max nj > c| =0: 
noo 1<j<k, 
kn 
(d) Jim, Ake jl>e} =0. 
j= 


It is clear that (d) > (c) > (b) > (a); see Exercise 1 below. It turns out that 
(b) is the appropriate condition, which will be given a name. 


DEFINITION. The double array (2) is said to be holospoudic* iff (b) holds. 


Theorem 7.1.1. A necessary and sufficient condition for (2) to be 
holospoudic is: 


(6) Wre &': im max 2 ae el 


~Ol<< 


PROOF. Assuming that (b) is true, we have 


OM fle are = | +f 


< | 2dFy,j(x) + |t| |x| dF pj) 
|x| >e€ 


Ixl<e 
<2] dFaj(x) + elt 
|x| >€ 
and consequently 


max |f,;(t) — 1| < 2max7{|X,j| > €} + €ltl. 
j j 


*I am indebted to Professor M. Wigodsky for suggesting this word, the only new term coined 
in this book. 
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Letting n — oo, then € — 0, we obtain (6). Conversely, we have by the 
inequality (2) of Sec. 6.3, 


€ 


/ aF,j@) <2-|5 [ fnj@dt 
|x|>€ 2 it|<2/e 


and consequently 


€ 
<5 / 1] — fnj (tl dr; 
it|s2/e 


€ 
max Y{|X,;| > €} < al max {1 — f,,;(t)| dt. 
j 2 Jinsrje J 
Letting n — ox, the right-hand side tends to 0 by (6) and bounded conver- 
gence; hence (b) follows. Note that the basic independence assumption is not 
needed in this theorem. 


We shall first give Liapounov’s form of the central limit theorem 
involving the third moment by a typical use of the ch.f. From this we deduce 
a special case concerning bounded r.v.’s. From this, we deduce the sufficiency 
of Lindeberg’s condition by direct arguments. Finally we give Feller’s proof 
of the necessity of that condition by a novel application of the ch_f. 

In order to bring out the almost mechanical nature of Liapounov’s result 
we state and prove a lemma in calculus. 


Lemma. Let {6,;, 1 < j <k,, 1 <n} bea double array of complex numbers 
satisfying the following conditions as n — oo: 


(i) max) <j<k, 18,, {| => 0; 
(ii) ye 10, ;| < M < oo, where M does not depend on n; 


(iii) ee 6,; —> 0, where 6 is a (finite) complex number. 


Then we have 
kn 
(7) [JG +4) > é. 


j=l 


PROOF. By (i), there exists no such that if n > no, then |@,;| < 5 for all 
j, so that 1+ 6,; 40. We shall consider only such large values of n below, 
and we shall denote by log (1 + 6,;) the determination of logarithm with an 
angle in (—z, 7]. Thus 


(8) log(1 + Gn) a Gn j + AlOn jl’, 


where A is a complex number depending on various variables but bounded 
by some absolute constant not depending on anything, and is not necessarily 
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the same in each appearance. In the present case we have in fact 


oo (-1y"7! n 
oe 


m=2 


1n |? ~ ] m2 2 
sts) = uP <1, 


m=2 


|log(1 + 6n;) — 9, j| = 


so that the absolute constant mentioned above may be taken to be 1. (The 
reader will supply such a computation next time.) Hence 


ky k,, k,, 
So log + nj) = S> Oj HAY 1Onj1?. 
j=l j=l j=l 


(This A is not the same as before, but bounded by the same 1!). It follows 
from (ii) and (i) that 


kn kn 
2 
(9) ) lO, j1° < max |6, ;| ) l@nj| <M max |6,;| — 0; 
I 1<jskn jal isjskn 
JF = 


and consequently we have by (iii), 


k, 
S| log(1 + Oj) > 0. 


j=l 


This is equivalent to (7). 


Theorem 7.1.2. Assume that (4) and (5) hold for the double array (2) and 
that y,; is finite for every n and j. If 


(10) Tr, - 0 
as n — oo, then S,, converges in dist. to ®. 


PROOF. For each n, the range of j below will be from 1 to k,. It follows 
from the assumption (10) and Liapounov’s inequality that 


<I, —- 0. 


(11) max o,; < max Ynj < 
j j 
By (3’) of Theorem 6.4.2, we have 
Fnj(t) =1- 50, jt + Anjy ltl 


where |A,j| < z We apply the lemma above, for a fixed f, to 


12 ,2 3 
On j = — 79, jf + An jynjltl . 
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Condition (i) is satisfied since 


2 
f 
max |6,| S > maxo,; + Alt}? max ynj > 0 
j J j 


by (11). Condition (ii) is satisfied since 
22 
do lnjl <5 t+ AlT 
j 


is bounded by (11); similarly condition (iii) is satisfied since 


? A 1? 
Dates =—5 FAT, > 5. 


It follows that 
kn 
II fnj(t) > ent /2, 


j=l 


This establishes the theorem by the convergence theorem of Sec. 6.3, since 
the left member is the ch.f. of S,. 


Corollary. Without supposing that é(X,,;) = 0, suppose that for each n and 
J there is a finite constant M,,; such that |X,;| < M,; a.e., and that 


(12) max M,; > 0. 


Isjskn 


Then S, — &(S,) converges in dist. to ®. 


This follows at once by the trivial inequality 


ky kn 

c C 3 2 
> EUXnj — E(Xnj)| )s? max Mnj 2 (Xn) 
I= Jz 


=2 max My}. 
1<j<kn 


The usual formulation of Theorem 7.1.2 for a single sequence of inde- 
pendent r.v.’s {Xj} with &(Xj) = 0, 07(Xj) = 0} < 00, &(|Xj°) = yj < 0, 


(13) 6 =30xX, 8 = 300, =v, 
j=l j=l j=l 


is as follows. 
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If 
(14) —= — 0, 


then S,,/s, converges in dist. to ®. 

This is obtained by setting X, ; = X ;/s,. It should be noticed that the double 
scheme gets rid of cumbersome fractions in proofs as well as in statements. 

We proceed to another proof of Liapounov’s theorem for a single 
sequence by the method of Lindeberg, who actually used it to prove his 
version of the central limit theorem, see Theorem 7.2.1 below. In recent times 
this method has been developed into a general tool by Trotter and Feller. 

The idea of Lindeberg is to approximate the sum X; +--- +X, in (13) 
successively by replacing one X at a time with a comparable normal (Gaussian) 
tr.v. Y, as follows. Let {Y;, j > 1} be r.v.’s having the normal distribution 
N(0, o7); thus Y; has the same mean and variance as the corresponding X; 
above; let all the x *s and Y’s be totally independent. Now put 


SS ya er A A VS en, 
with the obvious convention that 
Ly = Xp + Xn Zn = Vy te++ + Yn 1. 


To compare the distribution of (X;+Z,;)/s, with that of (Y;+Z p/Sn, we 
use Theorem 6.1.6 by comparing the expectations of test functions. Namely, 
we estimate the difference below for a suitable class of functions f: 


on eft ef (SH) 
“Ele C2) er C22) 


This equation follows by telescoping since Y; + Z; = Xj41 + Zj4. We take 
f in C}, the class of bounded continuous functions with three bounded contin- 
uous derivatives. By Taylor’s theorem, we have for every x and y: 

" 3 
FPR) 9 a 2 M » 


f+ y)— [roo + ros 


where M = sup,-a |f°(x)|. Hence if & and n are independent r.v.’s such that 
&{|n|?} < 00, we have by substitution followed by integration: 


1 
IA FE +n} — MFO} — AP OEn} — 5 AF ENE) 


M € 3 
(16) = goin ie 
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Note that the r.v.’s f(&), f’(&), and f”(&) are bounded hence integrable. If ¢ 
is another r.v. independent of € and having the same mean and variance as n, 
and ¢{|¢|*} < 00, we obtain by replacing n with ¢ in (16) and then taking the 
difference: 


M 
(17) Ie{fE+M}-AfE+ON < = Alm + (g[7}. 


This key formula is applied to each term on the right side of (15), with 
E = Zj/5n,N =Xj/Sn,6 = Yj/5,. The bounds on the right-hand side of (17) 
then add up to 


M n Vj co? 
18 — 24 
(18) 6 4 1 + s3 
j=! 
where c = ./8/7 since the absolute third moment of N(0, 0”) is equal to co;. 
By Liapounov’s inequality (Sec. 3.2) o; < yj, so that the quantity in (18) is 
Oy, /s? ). Let us introduce a unit normal r.v. N for convenience of notation, 


so that (Y; +---+Y,,)/s, may be replaced by N so far as its distribution is 
concerned. We have thus obtained the following estimate: 


e{r(™)}— strony so(S). 
Sn s3 


Consequently, under the condition (14), this converges to zero as n — oo. It 
follows by the general criterion for vague convergence in Theorem 6.1.6 that 
S,/Sn converges in distribution to the unit normal. This is Liapounov’s form of 
the central limit theorem proved above by the method of ch.f.’s. Lindeberg’s 
idea yields a by-product, due to Pinsky, which will be proved under the same 
assumptions as in Liapounov’s theorem above. 


(19) Vf EC: 


Theorem 7.1.3. Let {x,} be a sequence of real numbers increasing to +00 
but subject to the growth condition: for some € > 0, 


T, x? 
(20) log -+—7( +€)—> —0o 
S, 2 
as n —> oo. Then for this €, there exists N such that for all n > N we have 
2 2 
Xx x, 
(21) exp a +o < PISn = XnSn} < exp |-%a ~~ 7) : 


PROOF. This is derived by choosing particular test functions in (19). Let 
f €C? be such that 


fx) =0 forx<—}; O<f@)<]1 for —5 <x< 4; 
f@)=1 for x > 4; 


oe EL 
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and put for all x: 
Fn) = fa@-xm — 4), n(x) = f (x — xX, + 5). 
Thus we have, denoting by J, the indicator function of B Cc &!: 


Tx, +1,00) = fr (x) < | S 8n (x) < Tix, —1,00): 


It follows that 


(22) elf (=)} < PlSq > XnSp} € fe (=2)} 


whereas 
(23) PIN >= xX + 1b < &{fnWN)} < Ef{en()} < ALN = xm — VY. 


Using (19) for f = f, and f = g,, and combining the results with (22) and 
(23), we obtain 


PIN >x,+1}-—O (3)< P{S, > XnSn} 
se 


(24) <PIN>x,-1}+0O (3). 


Sh 


Now an elementary estimate yields for x > +00: 


y? 1 x2 
DUN xe =f exp (—7 ay tae (-5). 


(see Exercise 4 of Sec. 7.4), and a quick computation shows further that 


2 
P{N >x+1} = exp -Fa + 0(1)) , Xx +00. 
Thus (24) may be written as 
bo Ty 
P{Sn = XnSn} = exp = h +o0(1))} +0 me ib 


Suppose n is so large that the o(1) above is strictly less than € in absolute 
value; in order to conclude (23) it is sufficient to have 


2 
Leer exp 2a +6]), n—> Oo. 
3 2 


n 


This is the sense of the condition (20), and the theorem is proved. 
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Recalling that s, is the standard deviation of S,, we call the probability 
in (21) that of a “large deviation”. Observe that the two estimates of this 
probability given there have a ratio equal to en? which is large for each e, 
as X, — +00. Nonetheless, it is small relative to the principal factor enn 
on the logarithmic scale, which is just to say that ex? is small in absolute 
value compared with —x? /2. Thus the estimates in (21) is useful on such a 
scale, and may be applied in the proof of the law of the iterated logarithm in 
Sec. 7.5. 


EXERCISES 


*1. Prove that for arbitrary r.v.’s {X, j} in the array (2), the implications 
(d) > (c) > (b) > (a) are all strict. On the other hand, if the X,,;’s are 
independent in each row, then (d) = (c). 

2. For any sequence of r.v.’s {Y,}, if Y,/b, converges in dist. for an 
increasing sequence of constants {b,}, then Y,/b), converges in pr. to 0 if 
b, = o(b),). In particular, make precise the following statement: “The central 
limit theorem implies the weak law of large numbers.” 

3. For the double array (2), it is possible that S,,/b, converges in dist. 
for a sequence of strictly positive constants b, tending to a finite limit. Is it 
still possible if b, oscillates between finite limits? 

4. Let {X;} be independent r.v.’s such that max)<j<n |Xj|/b, > 0 in 
pr. and (S,, —a,)/b, converges to a nondegenerate d.f. Then b, — od, 
bnai/On “ae , and (An41 _ an)/Bpy + 0. 

*5,. In Theorem 7.1.2 let the df. of S, be F,,. Prove that given any € > 0, 
there exists a d(€) such that I, < d(e) > L(F,, ®) < €, where L is Levy 
distance. Strengthen the conclusion to read: 


sup |F',(x) — ®(x)| <€. 


xe! 


*6. Prove the assertion made in Exercise 5 of Sec. 5.3 using the methods 
of this section. [HINT: use Exercise 4 of Sec. 4.3.] 


7.2 Lindeberg—Feller theorem 


We can now state the Lindeberg—Feller theorem for the double array (2) of 
Sec. 7.1 (with independence in each row). 


Theorem 7.2.1. Assume o? j < 00 for each n and j and the reduction 


hypotheses (4) and (5) of Sec. 7.1. In order that as n — oo the two conclusions 
below both hold: 
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(i) S, converges in dist. to ®, 
(ii) the double array (2) of Sec. 7.1 is holospoudic; 


it is necessary and sufficient that for each n > 0, we have 


Kn 
(1) > / x? dF, j(x) > 0. 
|x|>n 


jel 


The condition (1) is called Lindeberg’s condition; it is manifestly 
equivalent to 


ky 


(1’) »/ x dF, (x) > 1. 
j=l Ix|s<n 


PROOF. Sufficiency. By the argument that leads to Chebyshev’s inequality, 
we have 


1 
(2) PAl\Xnj| > n} < =| 


Ix|>n 


x? dF y;(x). 


Hence (ii) follows from (1); indeed even the stronger form of negligibility (d) 
in Sec. 7.1 follows. Now for a fixed n, 0 <  < 1, we truncate X,,; as follows: 


(3) X= {or if IXnjt < n; 


0, otherwise. 


Put Sj, = S74, X) 


ny? 


o*(S’,) = s". We have, since &(Xn;) = 0, 


6X) = | xdF yj) =~ [ x dF, ;(x). 
Ixl<n 


Ix]>n 


Hence 5) 
|e (XI < | Ix|dFnj) < - | x? dF nj(x) 
|x]>n NJ \x\>n 


and so by (1), 


k, 
] W 
ei) < 7! 2 dF, j(x) > 0. 
j=1 


x|>n 


Next we have by the Cauchy—Schwarz inequality 


xs | Pari | Idk, 


|x|>n \x|>n 
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and consequently 


oO 2a) = | x’ dF, j(x) — &(X,,;¥ > {/ -| \ 2 dF, j00. 
ixl<n Ixl<n |x|[>n 


It follows by (1’) that 


kn 
L= 2 met ae }e ae 
=] x|<n x|>7 


Thus as n —> oo, we have 
s,—>1 and &(S,)—> 0. 


Since te te sais 
5, = {28 , £0, 
n Sn 

we conclude (see Theorem 4.4.6, but the situation is even simpler here) that 
if [S', — &(S;,)] | /s, converges in dist., so will S/,/s', to the same d.f. 

Now we try to apply the corollary to Theorem 7.1.2 to the double array 
{X), ne We have |X’, jl <7, so that the left member in (12) of Sec. 7.1 corre- 
sponding to this array is bounded by 7. But although 7 is at our disposal and 
may be taken as small as we please, it is fixed with respect to n in the above. 
Hence we cannot yet make use of the cited corollary. What is needed is the 
following lemma, which is useful in similar circumstances. 


Lemma 1. Let u(m, n) be a function of positive integers m and n such that 
Ym: lim u(m,n) = 0 
n-> OO 
Then there exists a sequence {m,} increasing to oo such that 


lim u(m,,n) = 0 
n-> OO 
PROOF. It is the statement of the lemma and its necessity in our appli- 
cation that requires a certain amount of sophistication; the proof is easy. For 
each m, there is an n,, such that n > n», => u(m,n) < 1/m. We may choose 
{n,m > 1} inductively so that n,, increases strictly with m. Now define 
(19 = 1) 
mM, =m for nm <n < Nmit- 


Then 


1 
u(m,,n) < — for Nm <n <Nm4i, 
m . 


and consequently the lemma is proved. 
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We apply the lemma to (1) as follows. For each m > 1, we have 


2 _ 
Jim m »/ x dF, (x) =0 


[x|>1/m 


It follows that there exists a sequence {n,} decreasing to 0 such that 


1 


=y/ x dF ,j(x) > 0. 
Th j=l |x|>nn 


Now we can go back and modify the definition in (3) by replacing n with 
nn. As indicated above, the cited corollary becomes applicable and yields the 
convergence of [S), — &(S),)]/s), in dist. to ©, hence also that of S\//s), as 
remarked. 

Finally we must go from S/, to S,. The idea is similar to Theorem 5.2.1 
but simpler. Observe that, for the modified X,; in (3) with 7 replaced by n,, 
we have 


Kn ky, 


ASn £ Si} SPS |JXnj ZX) =) P(iXnjl > tn) 


j=l j=l 


k, 
“~~ | 
<Sos x dF nj(x), 
j=l Tn J \xl>Mn 


the last inequality from (2). As n — oo, the last term tends to 0 by the above, 
hence S,, must have the same limit distribution as S), (why?) and the sufficiency 
of Lindeberg’s condition is proved. Although this method of proof is somewhat 
longer than a straightforward approach by means of ch.f.’s (Exercise 4 below), 
it involves several good ideas that can be used on other occasions. Indeed the 
sufficiency part of the most general form of the central limit theorem (see 
below) can be proved in the same way. 

Necessity. Here we must resort entirely to manipulations with ch.f.’s. By 
the convergence theorem of Sec. 6.3 and Theorem 7.1.1, the conditions (i) 
and (ii) are equivalent to: 


k,, 


“ri (Hh) — pl /2. 
(4) ve: tim |] fai =e" 
j=l 
(5) Vt: lim max |f,;(t) — 1| =0. 
n> 00 1<j<k, 


By Theorem 6.3.1, the convergence in (4) is uniform in |t| < T for each finite 
T : similarly for (5) by Theorem 7.1.1. Hence for each T there exists no(T) 
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such that if n > ng(7), then 


max max y= Tl <k. 
Il<T 1<j<k, nj O- NS 2 


We shall consider only such values of n below. We may take the distinguished 
logarithms (see Theorems 7.6.2 and 7.6.3 below) to conclude that 


Kn 2 
(6) Jim Slog fn) = -5. 
j=l 
By (8) and (9) of Sec. 7.1, we have 
(7) log fnj@) = fnj(t) - 1+ Alfnj(t) — 17; 


kn Kn 
(8) Dd lfngt) ~ AP Ss max [fnj)— UST lfaj() — IU. 


Now the last-written sum is, with some @, |6| < 1: 


oo ; t2x2 
(9) ye s- [. (im +0) dF, ;(x) 


j 
2 fore) 4 2 
ST ie. dF nj() = 5. 
J 


Hence it follows from (5) and (9) that the left member of (8) tends to 0 as 
n — oo. From this, (7), and (6) we obtain 


i (ca aB iG) 


2 


‘ t 
Jim, Y Afai — = 5. 
j 
Taking real parts, we have 


lim y / CT icos yd hi 5s 
: ct ®, ©) 
j 


noo 


Hence for each 7 > 0, if we split the integral into two parts and transpose one 
of them, we obtain 


_ {f? 
lim s-Lf (1 — cos tx) dF nj (x) 
; 4 ixlSn 


now)? 


= lim ~fo — cos tx) dF p j(x) 
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the last inequality above by Chebyshev’s inequality. Since 0 < 1 —cos@ < 
67/2 for every real 6, this implies 


2 > lim a ye -dF,(x)$ >0 
seh ~~ = x nj(X) ? = 0, 
n2 ~ n>00 ) 2 ; 2 Sixi<n Jj 


the quantity in braces being clearly positive. Thus 


j=l Ix|<n 
t being arbitrarily large while 7 is fixed; this implies Lindeberg’s condition 
(1'). Theorem 7.2.1 is completely proved. 


Lindeberg’s theorem contains both the identically distributed case 
(Theorem 6.4.4) and Liapounov’s theorem. Let us derive the latter, which 
assets that, under (4) and (5) of Sec. 7.1, the condition below for any one 
value of 6 > 0 is a sufficient condition for S,, to converge in dist. to ®: 


Kn oO 
(10) > / Ix|?*? dF, j(x) > 0. 
jai“ ~% 


For 6 = 1 this condition is just (10) of Sec. 7.1. In the general case the asser- 
tion follows at once from the following inequalities: 


Yigal < > 


x|> n° 


] lo@] 
<5 [ ePMar,jc, 


jx|?+8 


dF, ;(x) 


showing that (10) implies (1). 

The essence of Theorem 7.2.1 is the assumption of the finiteness of the 
second moments together with the “classical” norming factor s,, which is the 
standard deviation of the sum S,; see Exercise 10 below for an interesting 
possibility. In the most general case where “nothing is assumed,” we have the 
following criterion, due to Feller and in a somewhat different form to Lévy. 
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Theorem 7.2.2. For the double array (2) of Sec. 7.1 (with independence in 
each row), in order that there exists a sequence of constants {a,} such that 
(i) ye Xnj — 4, converges in dist. to ©, and (ii) the array is holospoudic, 
it is necessary and sufficient that the following two conditions hold for every 
n> 0: 


(a) it Susy Fj) > 0; 
0) yin Fj) — Spe, #4 nj OP} > 1. 


We refer to the monograph by Gnedenko and Kolmogorov [12] for this 
result and its variants, as well as the following criterion for a single sequence 
of independent, identically distributed r.v.’s due to Lévy. 


Theorem 7.2.3. Let {X;, 7 > 1} be independent r.v.’s having the common 
d.f. F; and S, = aes X ;. In order that there exist constants a, and b, > 0 
(necessarily b,, —> +00) such that (S,, — a,)/b, converges in dist. to ®, it is 
necessary and sufficient that we have, as y > +00: 


(11) yf F(x) =o( [ Pare). 
x|>y Ixisy 


The central limit theorem, applied to concrete cases, leads to asymptotic 
formulas. The best-known one is perhaps the following, for 0 < p< 1, p+ 
q = 1, and x; < x2, asn + o0: 

(12) 
XD 2 
eY/* dy. 


1 
(7) prq?* ~ O02) ~ O(x1) = Jag ce 


x./n pqsk—n ps<x2./n pq 


This formula, due to DeMoivre, can be derived from Stirling’s formula for 
factorials in a rather messy way. But it is just an application of Theorem 6.4.4 
(or 7.1.2), where each X; has the Bernoullian d.f. pd, + gdp. 

More interesting and less expected applications to combinatorial analysis 
will be ulustrated by the following example, which incidentally demonstrates 
the logical necessity of a double array even in simple situations. 

Consider all nm! distinct permutations (a), a2,...,a,) of the n integers 
(1, 2, ...,n). The sample space Q = Q, consists of these n! points, and 7 
assigns probability 1/n! to each of the points. For each j,1 < j <n, and 
each w = (a), 42,...,a,) let X,; be the number of “inversions” caused by 
j in w; namely X,;(w) =m if and only if j precedes exactly m of the inte- 
zers 1,..., 7 — 1 in the permutation w. The basic structure of the sequence 
{X,j;, 1 <j <n} is contained in the lemma below. 
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Lemma 2. For each n, the r.v.’s {X,, j | < j <n} are independent with the 
following distributions: 


1 
I forO<m<j-1. 


The lemma is a striking example that stochastic independence need not 
be an obvious phenomenon even in the simplest problems and may require 
verification as well as discovery. It is often disposed of perfunctorily but a 
formal proof is lengthier than one might think. Observe first that the values of 
Xni,-.++,;Xnj are determined as soon as the positions of the integers {1,..., j} 
are known, irrespective of those of the rest. Given j arbitrary positions among 
n ordered slots, the number of w’s in which {1, ..., j} occupy these positions in 
some order is j!(n — j)!. Among these the number of w’s in which j occupies 
the (j — m)th place, where 0 < m < j — 1, (in order from left to right) of the 
given positions is (j — 1)!(n — j)!. This position being fixed for j, the integers 
{1,..., 7 — 1} may occupy the remaining given positions in (j — 1)! distinct 
ways, each corresponding uniquely to a possible value of the random vector 


ails ia ASSL 


That this correspondence is 1 — 1 is easily seen if we note that the total 
number of possible values of this vector is precisely 1-2---(7 -1)= 


(j — 1)!. It follows that for any such value (c},...,cj-;) the number of 
w’s in which, first, {1,..., 7} occupy the given positions and, second, 
Xni(@) = C,..., Xn, j-1(@) = Cj-1, Xnj(@) = m, is equal to (n — j)!. Hence 


the number of w’s satisfying the second condition alone is equal to Jan — 
J)! =n!/j!. Summing over m from 0 to 7 — 1, we obtain the number of 
w’s in which X,1(@) = c¢1,..., Xn j-1(@) = cj-1 to be jn! /j!=nl/G —V!. 
Therefore we have 


n! 


PAK ny = Cy oy Xn jot = Cj-1 Xnj = _ ait _l 
PAK ay SC Ma po = Cpt) n! J 
G1) 


This, as we know, establishes the lemma. The reader is urged to see for himself 
whether he can shorten this argument while remaining scrupulous. 
The rest is simple algebra. We find 
. 2 2 3 
6 as ie ] ) pe ny 1 ae 2 n 
os eaciae en 4 n 36° 


For each 7 > 0, and sufficiently large n, we have 


Xpjlsj—-lsn—-1 < nsy. 
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Hence Lindeberg’s condition is satisfied for the double array 


Xnj P 
sl<j<n,l<n 
Sp 


(in fact the Corollary to Theorem 7.1.2 is also applicable) and we obtain the 
central limit theorem: 


Here for each permutation w, S,(w) = jai Xn j(@) is the total number of 
inversions in w; and the result above asserts that among the n! permutations 
on {1,...,m}, the number of those in which there are <n?/4 + x(n?/*)/6 
inversions has a proportion P(x), as n — ov. In particular, for example, the 
number of those with <n*/4 inversions has an asymptotic proportion of . 


EXERCISES 


1. Restate Theorem 7.1.2 in terms of normed sums of a single sequence. 
2. Prove that Lindeberg’s condition (1) implies that 


max 0,; > 0. 
Isjsky 


*3,. Prove that in Theorem 7.2.1, (i) does not imply (1). [HmIvT: Consider 
r.v.’§ with normal distributions. ] 

*4, Prove the sufficiency part of Theorem 7.2.1 without using 
Theorem 7.1.2, but by elaborating the proof of the latter. [HiInT: Use the 
expansion 


a tx)? 
fo] 4 itx + 0 for |x| > 7 
and : ; 
: t t 
e* — 1 + itx — a seen for |x| <n. 


As a matter of fact, Lindeberg’s original proof does not even use ch.f.’s; see 
Feller [13, vol. 2].] 

5. Derive Theorem 6.4.4 from Theorem 7.2.1. 

6. Prove that if 6 < 6’, then the condition (10) implies the similar one 
when 6 is replaced by 6’. 


pn Nate a 
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*7, Find an example where Lindeberg’s condition is satisfied but 
Liapounov’s is not for any 6 > 0. 
In Exercises 8 to 10 below {X;, j => 1} is a sequence of independent r.v.’s. 
8. For each j let X; have the uniform distribution in [—j, j]. Show that 
Lindeberg’s condition is satisfied and state the resulting central limit theorem. 
9. Let X; be defined as follows for some a > 1: 


sy igee with probability each; 


I 
6 j2@-D 


0, with probability 1 — Cos 
Prove that Lindeberg’s condition is satisfied if and only if a < 3/2. 

*10. It is important to realize that the failure of Lindeberg’s condition 
means only the failure of either (i) or (ii) in Theorem 7.2.1 with the specified 
constants s,. A central limit theorem may well hold with a different sequence 
of constants. Let 


1 
+j*, with probability —— each; 
12)? 
1 
Ap= 4 ty, with probability 1D each; 


1 1 
0, with probability 1 ae 
Prove that Lindeberg’s condition is not satisfied. Nonetheless if we take b? = 
n>/18, then S,/b, converges in dist. to ®. The point is that abnormally large 
values may not count! [HinT: Truncate out the abnormal value.] 
11. Prove that (ie x? dF(x) < implies the condition (11), but not 
vice versa. 
*12. The following combinatorial problem is similar to that of the number 
of inversions. Let {2 and Y be as in the example in the text. It is standard 
knowledge that each permutation 


1 2 a n 
. a2 awe Qn 
can be uniquely decomposed into the product of cycles, as follows. Consider 
the permutation as a mapping a from the set (1,...,) onto itself such 
that z(j) = a,;. Beginning with 1 and applying the mapping successively, 
1 > m(1) > 1°(1) - ---, until the first k such that 7*(1) = 1. Thus 


(1, w(1), #7(1),..., 7*1'1)) 
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is the first cycle of the decomposition. Next, begin with the least integer, 
say b, not in the first cycle and apply z to it successively; and so on. We 
say 1 — x(1) is the first step, ..., 7*-!(1) > 1 the kth step, b > m(b) the 
(k + 1)st step of the decomposition, and so on. Now define X,, ;(w) to be equal 
to 1 if in the decomposition of w, a cycle is completed at the jth step; otherwise 
to be 0. Prove that for each n, {X,;, 1 < j <n} is a set of independent r.v.’s 
with the following distributions: 


1 
P{X,; = 1} = —-———.,, 
OM eet 
1 
P{X,j) = 0} = 1-——_. 
n—-jt+l1 


Deduce the central limit theorem for the number of cycles of a permutation. 


7.3. Ramifications of the central limit theorem 


As an illustration of a general method of extending the central limit theorem 
to certain classes of dependent r.v.’s, we prove the following result. Further 
elaboration along the same lines is possible, but the basic idea, attributed to 
S. Bernstein, consists always in separating into blocks and neglecting small ones. 
Let {X,,,n > 1} be a sequence of r.v.’s; let A, be the Borel field generated 
by {X,, 1 <k <n}, and &’ that by {X;%,n < k < oo}. The sequence is called 
m-dependent iff there exists an integer m > 0 such that for every n the fields 
Y, and F,',,, are independent. When m = 0, this reduces to independence. 


Theorem 7.3.1. Suppose that {X,,} is a sequence of m-dependent, uniformly 


bounded r.v.’s such that 
a(S, ) 


ni/3 


—> +00 


as n —> 00. Then [S, — €(S,)]/o(S,) converges in dist. to ®. 


PROOF. Let the uniform bound be M. Without loss of generality we may 
suppose that &(X,,) =0 for each n. For an integer k > 1 let nj; = [jn/k], 
0 < j <k, and put for large values of n: 


Y; = Xn j+1 + Xn j42 aye tes +Xnj41-m} 
Zj = Xn j.j—m41 + Xn j4)—m42 apenas Aone 


We have then 


k-1 k-1 
Sn =S ¥,;+5 2; an, +S’, say. 
j=0 j=0 
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It follows from the hypothesis of m-dependence and Theorem 3.3.2 that the 
Y;’s are independent; so are the Z;’s, provided nj; --m+1—n,;>~m, 
which is the case if n/k is large enough. Although S’, and S” are not 
independent of each other, we shall show that the latter is comparatively 
negligible so that S, behaves like S),. Observe that each term X, in AYA 
is independent of every term X, in S;, except at most m terms, and that 
&(X,X;) = 0 when they are independent, while |&(X,X,)| <M? otherwise. 
Since there are km terms in S7, it follows that 


|E(S',S")| < km-m-M? =k(mM). 


We have also 
k=1 


ES") = S~ E(Z2) < kamMy’. 
j=0 


From these inequalities and the identity 
E(S2) = &(S'2) + 2E(S,S") + ES") 
we obtain 
| €(S2) — &(S?)| < 3km2M?. 


Now we choose k =k, =[n?/?] and write s? = &(S2) =07(S,),s = 
&(S") = 07(S’,). Then we have, as n > 00. 


(1) dy 


and 


uv 2 
(2) é (=) 256g 
Sp 


Hence, first, 5” /s, —> 0 in pr. (Theorem 4.1.4) and, second, since 


i ft i 
Sais Sen Pn 
= > 


/ 
Sn Sn Sp Sn 


Sn /S, Will converge in dist. to ® if S)/s), does. 

Since k,, is a function of n, in the notation above Y; should be replaced 
by Y,,; to form the double array {Y,;,0 < j <k, — 1,1 <n}, which retains 
independence in each row. We have, since each Y,,,; is the sum of no more 
than [n/k,]+ 1 of the X,,’s, 


Yn jl = (* i 1) Me O(n'/?) = O(Sn) = o(s,)s 


n 


a 
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the last relation from (1) and the one preceding it from a hypothesis of the 
theorem. Thus for each n > 0, we have for all sufficiently large n: 


/ x dFri(x)=0, O<j<k 1, 
Ix|>ns,, 


where F,,; is the d.f. of Y,,;. Hence Lindeberg’s condition is satisfied for the 
double array {Y,,;/s,,}, and we conclude that S’,/s’, converges in dist. to ®. 
This establishes the theorem as remarked above. 

The next extension of the central limit theorem is to the case of a random 
number of terms (cf. the second part of Sec. 5.5). That is, we shall deal with 
the r.v. S,, whose value at is given by S,,()(w), where 


Sn(@) = 5° Xj) 
j=1 


as before, and {v,(@),m > 1} is a sequence of r.v.’s. The simplest case, 
but not a very useful one, is when all the r.v.’s in the “double family” 
{Xn, Vn, > 1} are independent. The result below is more interesting and is 
distinguished by the simple nature of its hypothesis. The proof relies essentially 
on Kolmogorov’s inequality given in Theorem 5.3.1. 


Theorem 7.3.2. Let {X;, 7 => 1} be a sequence of independent, identically 
distributed r.v.’s with mean 0 and variance 1. Let {v,,n > 1} be a sequence 
of r.v.’s taking only strictly positive integer values such that 


(3) Ze > C in pr., 
n 


where c is a constant: 0 < c < oo. Then S,,/,/v, converges in dist. to ®. 


PROOF. We know from Theorem 6.4.4 that S,,/./n converges in dist. to 
®, so that our conclusion means that we can substitute v, for n there. The 
remarkable thing is that no kind of independence is assumed, but only the limit 
property in (3). First of all, we observe that in the result of Theorem 6.4.4 we 
may substitute [cn] (= integer part of cn) for n to conclude the convergence 
in dist. of Stcnj/W[cn] to ® (why?). Next we write 


Sy, ( Sten} Si, zat | [cn] 
ee J/{en] J/{en] Vn 
The second factor on the right converges to 1 in pr., by (3). Hence a simple 


argument used before (Theorem 4.4.6) shows that the theorem will be proved 
if we show that 


Sy, — S{en] 


(4) ea 


> 0 in pr. 
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Let € be given, 0 < € < 1; put 
an =[1—€)fen]], by =[ +e? )fen]] — 1. 
By (3), there exists no(€) such that if n > no(e), then the set 
A = {@! dn S Vy (@) < bn} 


has probability >1 — e. If @ is in this set, then S,,«.)(@) is one of the sums 
S; with a, < j < b,. For [cn] < j < b,, we have 


Sj — Sten] = Xfenj+i + Xfen}+2 +--+ + Xj; 
hence by Kolmogorov’s inequality 


07 (Sp, — Sten}) Z [cn] 


<€. 


af max |S; — Stcnj| > ven} < 


cn]<j<bp €2cn €2cn 


A similar inequality holds for a, < j < [cn]; combining the two, we obtain 


P{ max, |S; — Steny] > €Ven} < 2e. 


Gy SISOn 


Now we have, if n > no(eé): 


>} 


| Sv, = Sten] 
J{cn] 


1 


< P{», = js max [Sj — Stenj| > evieni} + D> Pn =) 
b an ZJSOn 


an <J<bp J€¢lan bn} 


< a4 max [Sj — Stenjl > evieni} + Alin € [any bnJ} 


An SJSOn 


<2e+1—-—S7{A} < 3e. 
Since € is arbitrary, this proves (4) and consequently the theorem. 


As a third, somewhat deeper application of the central limit theorem, we 
give an instance of another limiting distribution inherently tied up with the 
normal. In this application the role of the central limit theorem, indeed in 
high dimensions, is that of an underlying source giving rise to multifarious 
manifestations. The topic really belongs to the theory of Brownian motion 
process, to which one must turn for a true understanding. 
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Let {X;, j = 1} be a sequence of independent and identically distributed 
r.v.’s with mean 0 and variance 1; and 


Sp = 52 X;. 
j=l 


It will become evident that these assumptions may be considerably weakened, 
and the basic hypothesis is that the central limit theorem should be applicable. 
Consider now the infinite sequence of successive sums {S,,n > 1}. This is 
the kind of stochastic process to which an entire chapter will be devoted later 
(Chapter 8). The classical limit theorems we have so far discussed deal with 
the individual terms of the sequence {S,,} itself, but there are various other 
sequences derived from it that are no less interesting and useful. To give a 
few examples: 


1S m| 
max Sp», min S,, max |S,|, max —, 
I1<m<n l<m<n 1<m<n 1<m<n m 
n n 
So 5a(Sm)s > WSins Sm41)3 
m=] m=1 


where y(a, b) = 1 if ab <0 and 0 otherwise. Thus the last two examples 
represent, respectively, the “number of sums > a” and the “number of changes 
of sign”. Now the central idea, originating with Erdés and Kac, is that the 
asymptotic behavior of these functionals of S,, should be the same regardless of 
the special properties of {X ;}, so long as the central limit theorem applies to it 
(at least when certain regularity conditions are satisfied, such as the finiteness 
of a higher moment). Thus, in order to obtain the asymptotic distribution of 
one of these functionals one may calculate it in a very particular case where 
the calculations are feasible. We shall illustrate this method, which has been 
called an “invariance principle”, by carrying it out in the case of max S,,; for 
other cases see Exercises 6 and 7 below. 
Let us therefore put, for a given x: 


Pri { max Sin exit. 


l<m<n 


For an integer k > 1 let nj =[jn/k],0 < j <k, and define 


Rix) =P { max Sn, < xvi} . 
<jsk 
Let also 


Ej ={a:Sm(w) < xJ/n,1 <m < j;Sj(@) > xJ/n); 
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and for each j, define £(j) by 


Nejy-1 < J S Nay). 


Now we write, for 0 < € < x: 


PY max Sn > xJi} = SAE) neu) — Sil < evi) 


l<m<n 
j=l 


+S° PAE; Sey — Sil > €/n} = So+5, 
1 2 


j=l 


say. Since E; is independent of {|Snej) — Sj| > €/n} and o7(Snej) — Sj) < 
n/k, we have by Chebyshev’s inequality: 


On the other hand, since S; > x./n and (Sneyy — Syl < e./n imply Snecj) > 
(x — €),/n, we have 


S- < PY max Se >(x - ovat =1—-—Ry(x —€). 
; <t< 


It follows that 
1 


(5) Py (x) = 1 — es do = Rn @ — 6) — 
2 


Since it is trivial that P,(x) < Ryx(x), we obtain from (5) the following 
inequalities: 


1 
(6) Pr(x) < Rar) = Pa +e)+ = ok 
We shall show that for fixed x and k, limy—.oo Rng (x) exists. Since 


Rag(x) = A{Sn, <xJn, Sny <xJ/n,...,Sn, <xJ/n}, 


it is sufficient to show that the sequence of k-dimensional random vectors 


[k [k k 
( —Sn,, —Shyreees Vise 
n n n 
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converges in distribution as n — oo. Now its ch.f. f(t}, ...,t) is given by 


EexpliVk/n(tSn, +o + teSn,))} = ElexpiVk/nlt +--+ Sn 
+ (fo + +E) Sng — Sn,) 
Se ot th (Sn, =F Sie SIs 


which converges to 
(7) exp [—3(t1 Se tr)’| exp [—4(t feee tk)’ -+-exp (—3%;) ; 


since the ch.f.’s of 


k k k 
7on yom = Sn)» are) 7, bons al Sry) 


—- 


all converge to e~"/? by the central limit theorem (Theorem 6.4.4), 141 — 
nj; being asymptotically equal to n/k for each j. It is well known that the 
ch.f. given in (7) is that of the k-dimensional normal distribution, but for our 
purpose it is sufficient to know that the convergence theorem for ch.f.’s holds 
in any dimension, and so R,, converges vaguely to Roox, where Rox is some 
fixed k-dimensional distribution. 

Now suppose for a special sequence {X j} Satisfying the same conditions 
as {X ;}, the corresponding P,, can be shown to converge (“pointwise” in fact, 
but “vaguely” if need be): 


(8) Vx: lim P, (x) = GQ). 


Then, applying (6) with P, replaced by P,, and letting n — oo, we obtain, 
since Roox 1s a fixed distribution: 


1 
G(x) S Ron) S$ Ga +) + are 
ek 
Substituting back into the original (6) and taking upper and lower limits, 
we have 


1 1 a 
G@-€)— se < Row — 2) — = < EP) 
€ ek n 


LRA) SCA ass 
ek 
Letting k —> oo, we conclude that P, converges vaguely to G, since € is 
arbitrary. : 
It remains to prove (8) for a special cloice of {X j} and determine G. This 
can be done most expeditiously by taking the common df. of the X ;’s to be 
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the symmetric Bernoullian 4 (51 + 5_;). In this case we can indeed compute 
the more specific probability 


(9) A max Sm < X35, = yh, 
<m<=n 


where x and y are two integers such that x > 0,x > y. If we observe that 
in our particular case MaXj<m<n Sm > x if and only if S; =x for some j, 
1 <j <n, the probability in (9) is seen to be equal to 


PISn = y} - #4 max Sp, > X;S_ = yb 


1<m<n 


= PAS = y}— Y>PSm <x, 1 sm < jSj = x55, = y) 
j=! 


= PS, =y}—- S\PSm <x, 1 Sm < j;8;)=xS, —S;)=y—4} 
j=l 


=ASn =y)—S\PSm <x, 1 Sm < jj; =xPS_ — Sj = y — x), 
j=l 


where the last step is by independence. Now, the r.v. 


Sn — Sj = Xm 


m=j+1 
being symmetric, we have A{S, —S;=y—x}=PA{S, —S; =x — y}. 
Substituting this and reversing the steps, we obtain 


PSn = y}— YS) PSm <x, 1 <m < j3Sj =xXIP(Sn ~ Sj =x} 
j=l 


=PA'Sn =y)— So PlSm <x, 1 Sm < f3Sj =x5Sn — Sp =x— Y} 
j=l 


=ASn = y}~ J > P Sm <x, 1 Sm < j3Sj = x5Sp = 2x ~ y} 
j=l 
= P{S, = y} = 7 {max Sm 2 X5Sn = 2x -— yh. 
<m<n 


Since 2x — y > x, Sy, = 2x — y implies maxj<m<n Sm 2 X, hence the last line 
reduces to 


(10) PS, = y} — P\Sy = 2x — y}, 
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and we have proved that the value of the probability in (9) is given by 
(10). The trick used above, which consists in changing the sign of every Xj 
after the first time when S,, reaches the value x, or, geometrically speaking, 
reflecting the path {(j, S;), 7 = 1} about the line S; = x upon first reaching it, 
is called the “reflection principle”. It is often attributed to Desiré André in its 
combinatorial formulation of the so-called “ballot problem” (see Exercise 5 
below). 

The value of (10) is, of course, well known in the Bernoullian case, and 
summing over y we obtain, if n is even: 


] n n 
o{ Sa <x} = Doe { (222 | ae (naz) 


where (") = 0 if |jj > or if j is not an integer. Replacing x by x./n (or 
[x./n] if one is pedantic) in the last expression, and using the central limit 
theorem for the Bernoullian case in the form of (12) of Sec. 7.2 with p= q= 
1 we see that the preceding probability tends to the limit 


2? 
] “ 2 a ar 
of erg -/-/ Yq 
V2 J—x H Jo 5 ‘ 


as n — oo. It should be obvious, even without a similar calculation, that the 
same limit obtains for odd values of n. Finally, since this limit as a function 
of x is a d.f. with support in (0, 00), the corresponding limit for x < 0 must 
be 0. We state the final result below. 


Theorem 7.3.3. Let {X;, j => 0} be independent and identically distributed 
r.v.’s with mean 0 and variance 1, then (maxj<m<n Sm)/ /n converges in dist. 
to the “positive normal d.f.” G, where 


Vx: G(x) = (20(~) — 1) V0. 
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EXERCISES 


1. Let {X;, 7 > 1} be a sequence of independent r.v.’s, and f a Borel 
measurable function of m variables. Then if & = f(Xp41,...,Xk4m), the 
sequence {&,k > 1} is (m — 1)-dependent. 

*2. Let {X;,j = 1} be a sequence of independent r.v.’s having the 
Bernoullian d.f. pd; + ( — p)d9,0 < p< 1. An r-run of successes in the 
sample sequence {X ;(w), j > 1} is defined to be a sequence of r consecutive 
“ones” preceded and followed by “zeros”. Let NV, be the number of r-runs in 
the first n terms of the sample sequence. Prove a central limit theorem for N,. 

3. Let {X;, vj, 7 = 1} be independent r.v.’s such that the v;’s are integer- 
valued, v; —> oo a.e., and the central limit theorem applies to (S, — an)/b,, 
where S,, = inl X j, An, by are real constants, b, — oo. Then it also applies 
to (Sy, — ay, )/Dy,. 

*4. Give an example of a sequence of independent and identically 
distributed r.v.’s {X,,} with mean 0 and variance 1] and a sequence of positive 
integer-valued r.v.’s v, tending to oo a.e. such that S,, /s,, does not converge 
in distribution. [HINT: The easiest way is to use Theorem 8.3.3 below.] 

*5. There are a ballots marked A and b ballots marked B. Suppose that 
these a+ b ballots are counted in random order. What is the probability that 
the number of ballots for A always leads in the counting? 

6. If {X,} are independent, identically distributed symmetric r.v.’s, then 
for every x > 0, 


P{|S,| > x} = 27{ max IX;| > x} > 1 — eK >a, 


7. Deduce from Exercise 6 that for a symmetric stable rv. X with 
exponent a, 0 <a < 2 (see Sec. 6.5), there exists a constant c > O such that 
P|\X| > n'/“) > c/n. [This is due to Feller; use Exercise 3 of Sec. 6.5.] 

8. Under the same hypothesis as in Theorem 7.3.3, prove that 
MaXi<m<n |Sm|/./n converges in dist. to the d.f. H, where H(x) =0 for 
x <0 and 


Ayres (yp cane 
TT (Gy ———___——- | forx>0. 
ae d a | 8x2 ones, 


[HINT: There is no difficulty in proving that the limiting distribution is the 
same for any sequence satisfying the hypothesis. To find it for the symmetric 
Bernoullian case, show that for 0 < z < x we have 


ais < min Sin < max Sim <x-—Z2Sp a y-2 


l<m<n l<m<n 
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1 oO n n 
= ys {(nx2uts yz) ~(n¢20—y-2)}. 
reece 2 2 


This can be done by starting the sample path at z and reflecting at both barriers 
O and x (Kelvin’s method of images). Next, show that 


lim P{—z/n < Sn <(@-—2z)/n forl<m<n} 


Ha OO 
1 oO (2k+1)x—z 2kx-Z 27 
=) ee _ y 
sree i I je a 
ps5 2kx—z (2k—1)x—-z 


Finally, use the Fourier series for the function h of period 2x: 


PU if —x-z<y< —-Z; 
noy= 441 if —z<y<x-z; 


to convert the above limit to 
AS 1 | Qk +1)xz Or 
~ ——— sin ——-——— exp | - -——=———_ | . 
1 2k+1 x 2x2 
k=0 
This gives the asymptotic joint distribution of 


min S, and max S,, 
l<m<n l<m<n 
of which that of maxj<m<n |Sm| iS a particular case. 


9. Let {X; > 1} be independent r.v.’s with the symmetric Bernoullian 
distribution. Let N,,(w) be the number of zeros in the first m terms of the 
sample sequence {S;(w), j > 1}. Prove that N,,/,/n converges in dist. to the 
same G as in Theorem 7.3.3. [HInT: Use the method of moments. Show that 
for each integer r > 1: 


EN) ~ 1! S- P2j, P2jo-j1) °° * P2G-— 5-1) 
O<ji<--<jpsan/2 


where 


as j — oo. To evaluate the multiple sum, say }°(r), use induction on r as 


follows. If Jani 
Lo~e (5) 
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as n — oo, then 


Cc 1 ny (r+l)/2 
Sor tl) ~ 12 1 — zydz(— 
ore ef eee) (5) 


Thus 


Finally 


arp (5 ] 


Na \" Tir+1) g) ) ee 
& poe eS SF Ch "dG . 
(7) ~ per (E41) r (3) f oe 
2 2 


This result remains valid if the common d.f. F of X; is of the integer lattice 
type with mean 0 and variance 1. If F is not of the lattice type, no S,, need 
ever be zero— but the “next nearest thing”, to wit the number of changes of 
sign of S,, is asymptotically distributed as G, at least under the additional 
assumption of a finite third absolute moment.] 


7.4 Error estimation 


Questions of convergence lead inevitably to the question of the “speed” of 
convergence — in other words, to an investigation of the difference between 
the approximating expression and its limit. Specifically, if a sequence of d_f.’s 
F,, converge to the unit normal d.f. ®, as in the central limit theorem, what 
can one say about the “remainder term” F,,(x) — ®(x)? An adequate estimate 
of this term is necessary in many mathematical applications, as well as for . 
numerical computation. Under Liapounov’s condition there is a neat “order 
bound” due to Berry and Esseen, who improved upon Liapounov’s older result, 
as follows. 


Theorem 7.4.1. Under the hypotheses of Theorem 7.1.2, there is a universal 
constant Ag such that 


(1) sup |F', (x) — ®@)| S Aon 


where F’,, is the d.f. of S,,. 
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In the case of a single sequence of independent and identically distributed 
r.v.’s {X ;, 7 = 1} with mean 0, variance o”, and third absolute moment y< ©, 
the right side of (1) reduces to 


ny __ Aoy ] 
°(no2)3/2 og. 1/2” 


H. Cramér and P. L. Hsu have shown that under somewhat stronger condi- 
tions, one may even obtain an asymptotic expansion of the form: 
Ay(x) | H2@) | 3) 


BSR) a _ mCP) see 


where the H’s are explicit functions involving the Hermite polynomials. We 
shall not go into this, as the basic method of obtaining such an expansion is 
similar to the proof of the preceding theorem, although considerable technical 
complications arise. For this and other variants of the problem see Cramér 
[10], Gnedenko and Kolmogorov [12], and Hsu’s paper cited at the end of 
this chapter. 

We shall give the proof of Theorem 7.4.1 in a series of lemmas, in which 
the machinery of operating with ch-f.’s is further exploited and found to be 
efficient. 


Lemma 1. Let F be a df., G a real-valued function satisfying the condi- 
tions below: 


Gi) lim, G(x) = 0, lim, .4 G@) = 1; 
(ii) G has a derivative that is bounded everywhere: sup, |G’(@)| < M. 


Set 


(2) A= sa UP F(x) — G@x)I. 


Then there exists a real number a such that we have for every T > 0: 


TA 1 ~cosx 
(3) 2MTA —— dx-1 
0 XxX 
oji]- T. 
<|/ — SF +.) ~ Ga +a) dx}. 


pRooF. Clearly the A in (2) is finite, since G is everywhere bounded by 
(i) and (ii). We may suppose that the left member of (3) is strictly positive, for 
otherwise there is nothing to prove; hence A > 0. Since F — G vanishes at 
+o by (i), there exists a sequence of numbers {x, } converging to a finite limit 
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b such that F(x, ) — G(x, ) converges to 2MA or —2M A. Hence either F(b) — 
G(b) = 2MA or F(b—) — G(b) = —2M A. The two cases being similar, we 
Shall treat the second one. Put a = b — A; then if |x| < A, we have by (ii) 
and the mean value theorem of differential calculus: 


G(x +a) > Gb)+@—A)M 
and consequently 
F(x + a) —- G(x +a) < F(b—) — [G(b) + @ — ADM] = —M(x+ A). 
It follows that 


4 1—cosTx 1 — cos Tx 
/ 3 {F(x +a)~Ge+a}drs—m fo 3 + A)dx 
~A x? 


1 —cosT 
— —2MA [ ax: 
X 


{fo +f}: A Fe +) ~ Gat a)}dx 
<2MA ([. +[- ) RE ax sama [SE as 
—o A x A x 


Adding these inequalities, we obtain 


[Set Ge+ ajax s2ma { - [ef 
x? 


—CO 


1 —cosTx 1 —cosT 
1 OSS ax = 2MA af” +2[° 27 SOS! ax. 
0 0 2 


This reduces to (3), since 
[ °° 1 ~cosTx nT 
oT dx = 
0 x2 2 
by (3) of Sec. 6.2, provided that T is so large that the left member of (3) is 
positive; otherwise (3) is trivial. 


Lemma 2. In addition to the assumptions of Lemma 1, we assume that 


(iii) G is of bounded variation in (—0oo, 00); 
(iv) ia |F (x) — G(x)| dx < oo. 


Let x so 
f(t)= / e“dF(x), get= / e'*dG(x). 


—&O 
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Then we have 


I TIf@ — gt) 12 
” Ae) =a te 


PROOF. That the integral on the right side of (4) is finite will soon be 
apparent, and as a Lebesgue integral the value of the integrand at r = 0 may 
be overlooked. We have, by a partial integration, which is permissible on 
account of condition (iii): 


(5) fOSRiseH / (F@) — G(x)}e dr: 
and consequently 


FO = 80 oi = x (FQ +a) -— Ga t+ade™ dx. 


In particular, the left member above is bounded for all r # 0 by condition (iv). 
Multiplying by T — || and integrating, we obtain 


a aC) — g(t) —ita 
6 f PO=8O iar eiyat 


T [ove) : 
= / / {F@ +a) — Gx +a)je™(T — |t]) dxdt. 
~T J-0o 


We may invert the repeated integral by Fubini’s theorem and condition (iv), 
and obtain (cf. Exercise 2 of Sec. 6.2): 


OO oe T ~ 
[Fete Gte tay SET ag ep [" LO = a0 


t 


In conjunction with (3), this yields 


TAY _ is a 
(7) 2M A 13 [ cL ae x} < | we ea), 
0 x 0 t 


The quantity in braces in (7) is not less than 


° 1~cosx PAD uA 6 
3 —.—- dx —3 a ee oe 
i x? . [3 pee 2 TA 


Using this in (7), we obtain (4). 


Lemma 2 bounds the maximum difference between two d.f.’s satisfying 
certain regularity conditions by means of a certain average difference between 
their ch.f.’s (It is stated for a function of bounded variation, since this is needed 
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for the asymptotic expansion mentioned above). This should be compared with 
the general discussion around Theorem 6.3.4. We shall now apply the lemma 
to the specific case of F,, and ® in Theorem 7.4.1. Let the ch.f. of F,, be fn, 
so that 


ky, . 
frt) =] fri. 


j=l 
Lemma 3. For |r| < 1/(2T!/7), we have 


(8) Ifn(t) ev /?| <1, [ter ?. 


prooF. We shall denote by 6 below a “generic” complex number with 
\0| < 1, otherwise unspecified and not necessarily the same at each appearance. 
By Taylor’s expansion: 


wt) =1— ond oti 8. 
Fnjt) 5 + r 


For the range of ¢ given in the lemma, we have by Liapounov’s inequality: 


(9) lon jth < lyn tl < In Pel < 4, 
so that , 

Onj.2 Ome) 1 1 1 

——— f — — _ 

| 7 TE 8 484 


Using (8) of Sec. 7.1 with A = 6/2, we may write 


Onj 2 OYn j 3 6 of jt? On jt? 2 
log fn) =" +—p+-]-— 4+ 


6 2 


The absolute value of the last term above is less than 


oft tty? 76 Onjltl Yai ltl? 1 1 
nj nj < ny nj (te < —— +4 ) wilt} 
4 1 36 <( 4 | 36 ) matt? = (Z5 36.8 ) Yin 


by (9); hence 


o 


Oni 2 1 1 1 Onj o 9 
log fnj(t) = — ,) +0 6 8 288 yn ltl? = 5) at + 5 Init 


Summing over j, we have 


roe 4 
log fr(t) = =} + 5 Pat , 
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or explicitly: 


1 
< = 1, Ir}? 
z 


t2 
log fn(t)+ a 


Since |e“ — 1} < |ule'! for all u, it follows that 


: r, (el? Flee 
Lfa(tie? 1) < 2 exp | id ti 


2 
Since I, |t/3/2 < 1/16 and e!/!© < 2, this implies (8). 


Lemma 4. For |r| < 1/(4T,,), we have 


(10) fn <et?. 


PROOF. We symmetrize (see end of Sec. 6.2) to facilitate the estimation. 
We have 


Line = | / cos t(x — y)d Fy j(x) dF'nj(y), 


since | f , i? is real. Using the elementary inequalities 


uz 


cos y+ < ah 
u Paes aaneeny hemes 
2 


ns 6 3 
ix — yl}? < 4(jxP? + Jy); 


we see that the double integral above does not exceed 


fore) oC 2 e 
7 / {1 ~ 5? —2xy + y’)+ ual + yey} dF, (x) dF nj(y) 


4 4 
=j- o, jt” + 3 Ynjltl < exp (~03,7 a =rait . 
Multiplying over j, we obtain 
4 2 
lfn()? < exp (-# + Pale St ee 


for the range of t specified in the lemma, proving (10). 


Note that Lemma 4 is weaker than Lemma 3 but valid in a wider range. 
We now combine them. 


Lemma 5. For jt| < 1/(4T,,), we have 


(11) lfn(t)—e'?| < 16F, [thee ?. 
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pRooF. If |t} < 1/(2P,!/3), this is implied by (8). If 1/(2T,,)/7) < |t} < 
1/(41,,), then 1 < 8I,|¢|*, and so by (10): 


fat) et ?| < [fa teu? <2e7° < 16F,|t2et?, 


PROOF OF THEOREM 7.4.1. Apply Lemma 2 with F = F,, and G= ®. The 


M in condition (ii) of Lemma I may be taken to be 5, since both F,, and ® 


have mean 0 and variance 1, it follows from Chebyshev’s inequality that 
FQ) V GQ) <5. if x < 0, 
(1 — F(x)) v 1 — G(x) < > if x > 0; 
and consequently 
Vx: |F (x) — G(x) < *. 


Thus condition (iv) of Lemma 2 is satisfied. In (4) we take T = 1/(4T,,); we 
have then from (4) and (11): 


2 pV) FQ) — et | 96 
sup |F n(x) — ®(x)| < = / wt N dt + YP, 
x Ww JO t J 2703 


32r pe ee 96 
n —?/3 
<i re? Bdt+ —=TIr 
x Jo Jan ” 


32 se 96 
<t.{= [ PeP ars oo}. 
mw Jo 23 


This establishes (1) with a numerical value for Ag (which may be somewhat 
improved). 


Although Theorem 7.4.1 gives the best possible uniform estimate of the 
remainder F,, (x) — ®(x), namely one that does not depend on x, it becomes 
less useful if x = x, increases with n even at a moderate speed. For instance, 


we have 
| °° 2 2 
1—F,(%,)= Tz e!* dy + OP n), 
IU SX, 


where the first “principal” term on the right is asymptotically equal to 


2 
etal? 


VM 2IXp 


Hence already when x, = ./2log(1/T,,) this will be o(I’,,) for [, —> 0 and 
absorbed by the remainder. For such “large deviations”, what is of interest is 


ee 
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an asymptotic evaluation of 
1 — FrQ@n) 


1 — d(x,,) 
as X, —> OO more rapidly than indicated above. This requires a different type 


of approximation by means of “bilateral Laplace transforms”, which will not 
be discussed here. 


EXERCISES 


j. If F and G are d.f.’s with finite first moments, then 


oO 
/ |F (x) ~ G(x)| dx < oo. 
—co 
[HINT: Use Exercise 18 of Sec. 3.2.] 
2. If f and g are ch.f.’s such that f(t) = g(t) for |t| < T, then 
°° 1 
/ \F (x) - G@)|dx < =. 
This is due to Esseen (Acta Math, 77(1944)). 
*3. There exists a universal constant A; > 0 such that for any sequence 


of independent, identically distributed integer-valued r.v.’s {X;} with mean 0 
and variance 1, we have 


A, 


sup |Fn (x) ~ P(x)| = wif’ 


where F,, is the df. of ja X ;)/./n. [Hint: Use Exercise 24 of Sec. 6.4.] 
4. Prove that for every x > 0: 


x ~x?/2 ia —y?/2 1 —x?/2 
e < e dy<-e . 
1+ x? x x 


7.5 Law of the iterated logarithm 


The law of the iterated logarithm is a crowning achievement in Classical proba- 
bility theory. It had its origin in attempts to perfect Borel’s theorem on normal 
numbers (Theorem 5.1.3). In its simplest but basic form, this asserts: if N,(w) 
denotes the number of occurrences of the digit 1 in the first n places of the 
binary (dyadic) expansion of the real number o in [0, 1], then N,(w) ~ n/2 
for almost every w in Borel measure. What can one say about the devia- 
tion N,,(w) — n/2? The order bounds O(n"/?)**), € > 0; O(n logn)!/”) (cf. 
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Theorem 5.4.1); and O((n log logn)!/*) were obtained successively by Haus- 
dorff (1913), Hardy and Littlewood (1914), and Khintchine (1922); but in 
1924 Khintchine gave the definitive answer: 


n 
N,(@) — ~ 
lim neler ae | 


noo ] 1 , 
= re) 
yg” oglogn 


for almost every w. This sharp result with such a fine order of infinity as “log 
log” earned its celebrated name. No less celebrated is the following extension 
given by Kolmogorov (1929). Let {X,, > 1} be a sequence of independent 
[Ve Si S555 Do X ;; Suppose that é(X,,) = 0 for each n and 


Sn 
spinel =0 (ret) 


where s* = o7(S,,), then we have for almost every w: 


im Sn (@) 
in ————————- = 1 
noo ./2s? log log Sy 


The condition (1) was shown by Marcinkiewicz and Zygmund to be of the 
best possible kind, but an interesting complement was added by Hartman and 
Wintner that (2) also holds if the X,,’s are identically distributed with a finite 
second moment. Finally, further sharpening of (2) was given by Kolmogorov 
and by Erdés in the Bernoullian case, and in the general case under exact 
conditions by Feller; the “last word” being as follows: for any increasing 
sequence ~,, we have 


(2) 


A(Sy() > Sun 0.) = {9 


according as the series 


foe) < 
3 Pn on" /2 an 
n = 


n=] 


We shall prove the result (2) under a condition different from (1) and 
apparently overlapping it. This makes it possible to avoid an intricate estimate 
concerning “large deviations” in the central limit theorem and to replace it by 
an immediate consequence of Theorem 7.4.1.* It will become evident that the 


* An alternative which bypasses Sec. 7.4 is to use Theorem 7.1.3; the details are left as an 
exercise. 
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proof of such a “strong limit theorem” (bounds with probability one) as the 
law of the iterated logarithm depends essentially on the corresponding “weak 
limit theorem” (convergence of distributions) with a sufficiently good estimate 
of the remainder term. 

The vital link just mentioned will be given below as Lemma 1. In the 
sest of this section “A” will denote a generic strictly positive constant, not 
necessarily the same at each appearance, and A wil] denote a constant such 
that |A| <A. We shall also use the notation in the preceding statement of 
Kolmogorov’s theorem, and 


n 
yn = €(XnP) Tr= Soy; 
j=l 


as in Sec. 7.4, but for a single sequence of r.v.’s. Let us set also 


g(A, x) = 4/2ax*loglogx, A>O0,x>0. 


Lemma 1. Suppose that for some €, 0 < € < 1, we have 


Ty A 
s> ~ (logs, )It€ 


(3) 
Then for each 5,0 < 5 < €, we have 


A 
4 GAS l é, n See pRCT ee 
(4) (Sn > GCL + 4, Sn} (logs, 18 


A 


5) PS, > PL — 8, Sn)} = (logs, )!- 2)" 


PROOF. By Theorem 7.4.1, we have for each x: 
(6) WAS, > x5n) oe ene 
DAS) Sh e y =. 
2m Jx Ss? 


n 


We have as x > ©: 
oO ' pwr 
(7) / eve dy~ : 
e x 


(See Exercise 4 of Sec. 7.4). Substituting x = ./2(1 + 5) loglogs,, the first 
term on the right side of (6) is, by (7), asymptotically equal to 


i 1 


J4n(t £ 8) log logs, (logs, )'*> 
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This dominates the second (remainder) term on the right side of (6), by (3) 
since 0 < 6 < €. Hence (4) and (5) follow as rather weak consequences. 


To establish (2), let us write for each fixed 6,0 < 6 < €: 
E,* = {w:S,(w) > o(1+ 6, s,)}, 
E, = {w:S,(@) > g( —4, Sp )}, 


and proceed by steps. 
1°. We prove first that 


(8) PAE,* io.) =0 


in the notation introduced in Sec. 4.2, by using the convergence part of 
the Borel—Cantelli lemma there. But it is evident from (4) that the series 
>, PEn*) is far from convergent, since s, is expected to be of the order 
of ./n. The main trick is to apply the lemma to a crucial subsequence {n,} 
(see Theorem 5.1.2 for a crude form of the trick) chosen to have two prop- 
erties: first, }°>, P(E,, +) converges, and second, “E,,* i.o.” already implies 
“En, 1.0.” nearly, namely if the given 6 is slightly decreased. This modified 
implication is a consequence of a simple but essential probabilistic argument 
spelled out in Lemma 2 below. 

Given c > 1, let nz be the largest value of n satisfying s, < c*, so that 


k 
Sny sc < Snyp+l- 
Since (max}<j<n 0j)/S, — 0 (why?), we have s,,+1/5n, — 1, and so 
k 
(9) ae 


as k ~ oo. Now for each k, consider the range of j below: 


(10) Ae SJ < Me 
and put 
(11) FS {On Sy SSS: 
By Chebyshev’s inequality, we have 
Se Sage 1 
PE ye Sa ee —s a 


Ag+ 


hence A(F ;) > A > 0 for all sufficiently large k. 


Lemma 2. Let {£;} and {Fj}, 1 < j <n < 00, be two sequences of events. 
Suppose that for each j, the event F; is independent of E{---£%_,£j, and 
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that there exists a constant A > 0 such that A(F;) >A for every j. Then 
we have 


n n 
(12) P\\JE\F;| >A? | JE, 
j=l j=l 
PROOF. The left member in (12) is equal to 


P| le Fi’ Ey F-1E/F I 
j=l 


> Pp JE ES EF] = S° PAE, + ESE PF) 
j=l 


j=l 
> SO PES. ES) E;)-A, 
j=l 


which is equal to the right member in (12). 
Applying Lemma 2 to E;* and the F; in (11), we obtain 


Ng+invl Agia 


(13) P< | EytF) 9 > AP) J E;* 


JEM JEN 
It is clear that the event E;+ M F; implies 
Srey > Sj — Sina, > GPU +4, 55) ~ Sry, 5 
which is, by (9) and (10), asymptotically greater than 


Choose c so close to | that (1 + 3/46)/c* > 1+ (8/2) and put 


r) 
Gp = {eS > (: + ssn) 


note that G; is just E,,,.,* with 6 replaced by 5/2. The above implication may 
be written as 
Ej*F; C GE 
for sufficiently large k and all j in the range given in (10); hence we have 
neyi-l 
(14) LJ ETF) CG. 


J=Nk 
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It follows from (4) that 
dG) s » as AD Glogararm <% 
, (log Sn, ) +(6/2) — 7 (k log _ 


In conjunction with (13) and (14) we conclude that 


Ail 


S oP U E;* } <0; 


k j= 
and consequently by the Borel—Cantelli lemma that 


Nyeaiavl 
P\ |J Ejtio.> =0. 
J=nk 
This is equivalent (why?) to the desired result (8). 
2°. Next we prove that with the same subsequence {n,;} but an arbitrary 


c, if we put t7 = Srey — s, and 
Dy = fo Sn.) (@) — Sn, (@) > @ (1 — te) } ; 
then we have 
(15) P(Dy, 1.0.) = 1. 
Since the differences S,,,, — Sn,,k > 1, are independent r.v.’s, the divergence 


part of the Borel—Cantelli lemma is applicable to them. To estimate A(D,), 
we may apply Lemma | to the sequence {X,,,+;, ] = 1}. We have 


1 1 
20f7-2)2 wvf1 2) 264+) 
ty (1 5) Shea (: =) c 


and consequently by (3): 


Pres ~~ Pn, < AD nis < A 
th ~ So ~ (log t,)!+€- 
Hence by (5), A 
A 


P(Dx) = (ogy) = = 


and so )», P(D;) = oo and (15) follows. 
3°. We shall use (8) and (15) together to prove 


(16) P(En” i.) = 1. 


en AR TS 
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This requires just a rough estimate entailing a suitable choice of c. By (8) 
applied to {—X,,} and (15), for almost every w the following two assertions 
are true: 


(i) Sn,.,(@) — Sn, (@) > GO — (6/2), t,) for infinitely many k; 
(ii) S,,,(w) > —y(2, s,,) for all sufficiently large k. 


For such an w, we have then 


6 
(17) Si. (@) > @ (1 = 5 t) SIP Sn.) for infinitely many k. 


Using (9) and log log 17 ~ log log ae we see that the expression in the right 


side of (17) is asymptotically greater than 


| (1 _ 5) (: = 3) ar 2 gl, Sniay) > gd =D, Srisi)s 
Z c c 


provided that c is chosen sufficiently large. Doing this, we have therefore 
proved that 


(18) PlEn,7 1.0.) = 1, 


which certainly implies (16). 

4°. The truth of (8) and (16), for each fixed 5,0 < 6 <€, means exactly 
the conclusion (2), bv an argument that should by now be familiar to the 
reader. 


Theorem 7.5.1. Under the condition (3), the lim sup and lim inf, as n — oo, 
of S,/4/2s2 log logs, are respectively +1 and —1, with probability one. 


The assertion about liminf follows, of course, from (2) if we apply it 
to {—X;, j => 1}. Recall that (3) is more than sufficient to ensure the validity 
of the central limit theorem, namely that S,,/s, converges in dist. to ©. Thus 
the law of the iterated logarithm complements the central limit theorem by 
circumscribing the extraordinary fluctuations of the sequence {S,,n > 1}. An 
immediate consequence is that for almost every w, the sample sequence S, (w) 
changes sign infinitely often. For much more precise results in this direction 
see Chapter 8. 

In view of the discussion preceding Theorem 7.3.3, one may wonder 
about the almost everwhere bounds for 


mex Sip), max {Sin|, and so on. 
l<m<n 


l<rm<n 
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It is interesting to observe that as far as the lim sup, is concerned, these two 
functionals behave exactly like S, itself (Exercise 2 below). However, the 
question of liminf,, 1s quite different. In the case of maxj<m<n |S|, another 
law of the (inverted) iterated logarithm holds as follows. For almost every o, 
we have 

MaxXj<m<n (Sin(w)| = 


noo mS _2 


lim 


1; 


8 log log s, 


under a condition analogous to but stronger than (3). Finally, one may wonder 
about an asymptotic lower bound for |S,|. It is rather trivial to see that this 
is always o(s,) when the central limit theorem is applicable; but actually it is 
even o(s,') in some general cases. Indeed in the integer lattice case, under 
the conditions of Exercise 9 of 7.3, we have “S, = 0 1.0. a.e.” This kind of 
phenomenon belongs really to the recurrence properties of the sequence {S;,}, 
to be discussed in Chapter 8. 


EXERCISES 


1. Show that condition (3) is fulfilled if the X;’s have a common df. 
with a finite third moment. 
*2. Prove that whenever (2) holds, then the analogous relations with S,, 
replaced by maxj<m<n Sm OF MaXj<m<n |Sm| also hold. 


*3. Let {X;, 7 => 1} be a sequence of independent, identically distributed 
r.v.’s with mean 0 and variance 1, and S, = }74_, Xj. Then 


2 tn S22 <o} a 


n> 0o n 


[Hint: Consider S,,,,, —S,, with np ~ k*. A quick proof follows from 
Theorem 8.3.3 below.] 

4. Prove that in Exercise 9 of Sec. 7.3 we have A{S, = 0 i.o.} = 1. 

*5. The law of the iterated logarithm may be used to supply certain coun- 

terexamples. For instance, if the X,,’s are independent and X, = -tn'/?/log log 
n with probability 5 each, then S,,/n — 0 a.e., but Kolmogorov’s sufficient 
condition (see case (i) after Theorem 5.4.1) 5>, &(X2)/n? < 00 fails. 

6. Prove that A{|S,| > (1 —6,s,)i.o.} =1, without use of (8), as 
follows. Let 


ey = {w: |Sn,(@)| < GU — 4, Sie dt; 


5 
f= {Sn (0) ~ Sn) > @ (13 o5ncr) 
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Show that for sufficiently large k the event e, % f; implies the complement 
of ex41; hence deduce 


k+1 k 
P| (ej | < Pe) [JU- Pu 
J=jo J=Jo 


and show that the product + 0 as k > oo. 


7.6 Infinite divisibility 


The weak law of large numbers and the central limit theorem are concerned, 
respectively, with the convergence in dist. of sums of independent r.v.’s to a 
degenerate and a normal d.f. It may seem strange that we should be so much 
occupied with these two apparently unrelated distributions. Let us point out, 
however, that in terms of ch.f.’s these two may be denoted, respectively, by 
et and et'-&Y __ exponentials of polynomials of the first and second degree 
in (it). This explains the considerable similarity between the two cases, as 
evidenced particularly in Theorems 6.4.3 and 6.4.4. 

Now the question arises: what other limiting d.f.’s are there when small 
independent r.v.’s are added? Specifically, consider the double array (2) in 
Sec. 7.1, in which independence in each row and holospoudicity are assumed. 
Suppose that for some sequence of constants a, 


kn 
Sh — ay = ; Kai Op 
jal 


converges in dist. to F. What is the class of such F’s, and when does 
such a convergence take place? For a single sequence of independent r.v.’s 
{X;, j = 1}, similar questions may be posed for the “normed sums” (S, — 
an)/ Dn. 
These questions have been answered completely by the work of Lévy, Khint- 
chine, Kolmogorov, and others; for a comprehensive treatment we refer to the 
book by Gnedenko and Kolmogorov [12]. Here we must content ourselves 
with a modest introduction to this important and beautiful subject. 

We begin by recalling other cases of the above-mentioned limiting distri- 
butions, conveniently disp!zved by their ch.f.’s: 


. { mn ao, eg 
ge sh 7 >0; clas 0O<a<2,c>0. 


The former is the Poissoz distribution; the latter is called the symmetric 
stable distribution of exporent a (see the discussion in Sec. 6.5), including 
the Cauchy distribution fc: a = 1. We may include the normal distribution 
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among the latter for a = 2. All these are exponentials and have the further 
property that their “nth roots”: 


eilit/n el/n \ait—b? 1°) erin )(e-1) en e/ny|el* 
are also ch.f.’s. It is remarkable that this simple property already characterizes 
the class of distributions we are looking for (although we shall prove only 
part of this fact here). 


DEFINITION OF INFINITE DIVISIBILITY. A ch.f. f is called infinitely divisible iff 
for each integer n > 1, there exists a ch.f. f,, such that 


(1) f=(fn)". 
In terms of d.f.’s, this becomes in obvious notation: 
ae oie Ore Se ee ee 
(n factors) 


In terms of r.v.’s this means, for each n > 1, in a suitable probability space 
(why “suitable’”?): there exist r.v.’s X and X,;,1< j <n, the latter being 
independent among themselves, such that X has ch.f. f, X,; has ch.f. f,, and 


(2) ay Rae 
j=l 


X is thus “divisible” into n independent and identically distributed parts, for 
each n. That such an X, and only such a one, can be the “limit” of sums of 
small independent terms as described above seems at least plausible. 

A vital, though not characteristic, property of an infinitely divisible ch.f. 
will be given first. 


Theorem 7.6.1. An infinitely divisible ch.f. never vanishes (for real f). 


PROOF. We shall see presently that a complex-valued ch.f. is trouble-some 
when its “nth root” is to be extracted. So let us avoid this by going to the real, 
using the Corollary to Theorem 6.1.4. Let f and f, be as in (1) and write 


BNR: ee lat 


For each t € &!, g(t) being real and positive, though conceivably vanishing, 
its real positive nth root is uniquely defined; let us denote it by [g(t)]'/". Since 
by (1) we have 


g(t) = [gn @))", 
and g,(t) > 0, it follows that 


(3) Vt: gn(t) = [e(t)]'". 


a nL ET 
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But 0 < g(r) < 1, hence lim,_...[g(t)]'! is 0 or 1 according as g(t) = 0 or 
g(t) £0. Thus limy—oo Zn(t) exists for every ¢, and the limit function, say 
h(t), can take at most the two possible values 0 and 1. Furthermore, since g is 
continuous at rf = 0 with g(0) = 1, there exists a fg > 0 such that g(t) 4 0 for 
|t| < to. It follows that A(t) = 1 for |t| < to. Therefore the sequence of ch.f.’s 
gn converges to the function h, which has just been shown to be continuous 
at the origin. By the convergence theorem of Sec. 6.3, A must be a ch.f. and 
so continuous everywhere. Hence h is identically equal to 1, and so by the 
remark after (3) we have 


We: |f (@)I? = g(t) £0. 
The theorem is proved. 


Theorem 7.6.1 immediately shows that the uniform distribution on [—1, 1] 
is not infinitely divisible, since its ch.f. is sin t/t, which vanishes for some f, 
although in a literal sense it has infinitely many divisors! (See Exercise 8 of 
Sec. 6.3.) On the other hand, the ch-f. 


2+ cost 
3 


never vanishes, but for it (1) fails even when n = 2; when n > 3, the failure 
of (1) for this ch.f. is an immediate consequence of Exercise 7 of Sec. 6.1, if 
we notice that the corresponding p.m. consists of exactly 3 atoms. 

Now that we have proved Theorem 7.6.1, it seems natural to go back to 
an arbitrary infinitelv divisible ch.f. and establish the generalization of (3): 


fat) = (FOV 


for some “determir tion” of the multiple-valued nth root on the right side. This 
can be done by a s:mple process of “continuous continuation” of a complex- 
valued function of 2 real variable. Although merely an elementary exercise in 
“complex variables”. it has been treated in the existing literature in a cavalier 
fashion and then misused or abused. For this reason the main propositions 
will be spelled out :n meticulous detail here. 


Theorem 7.6.2. _et a complex-valued function f of the real variable ¢ be 
given. Suppose thz: f(0) = 1 and that for some T > 0, f is continuous in 
[—T, T] and does r:t vanish in the interval. Then there exists a unique (single- 
valued) function 7. 2f t in [—T, T] with A(O) = 0 that is continuous there and 
satisfies 


(4) fae, SPiei a7: 
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The corresponding statement when [—7,T] is replaced by (—oo, oo) is 
also true. 


PROOF. Consider the range of f(t),t € [-7,T]; this is a closed set of 

points in the complex plane. Since it does not contain the origin, we have 
inf |f@)—-—0| =p, > 0. 
—T<t<T 

Next, since f is uniformly continuous in [—T, 7], there exists a 57,0 < dr < 
pr, such that if ¢ and 7’ both belong to [—T, T] and |t — #’| < 67, then | f(r) — 
f@)| < pr/2 < 7 Now divide [—7, 7] into equal parts of length less than 
or, Say: 


—T=t_p<--+:<t.yp<to=O<t) <--+<the =T. 

For t_} <t < t,, we define A as follows: 

ee 

(-1) , 

(5) M(t) = > ——~{ fF - 1. 

EG ca 

j 
This is a continuous function of ¢ in [t_1, t)], representing that determination 
of log f(t) which equals 0 for tf = 0. Suppose that A has already been defined 
in [t_,, t,]; then we define A in [tg, t,41] as follows: 


6) Me) = Mt) + YO (2) , 


j=l 
similarly in [t_z_1, t_,] by replacing t, with t_, everywhere on the right side 
above. Since we have, for t, < t < te, 


Pr 

fO-fG)| 2 1} 
Ff (te) =r 

the power series in (6) converges uniformly in [%,, t,41] and represents a 

continuous function there equal to that determination of the logarithm of the 

function f (t)/ f(t) — 1 which is 0 for t = t,. Specifically, for the “schlicht 

neighborhood” |z — 1| < §, let 


fee) | j 
(7) LZ) = S> —¢ —1) 


j=l 


be the unique determination of log z vanishing at z = 1. Then (5) and (6) 
become, respectively: 


A(th = LF), es es ae 4 


A(t) =A) +L i). th StS tea 


te ap a LE LA CL | 
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with a similar expression for f_y_; <1 < t_,. Thus (4) is satisfied in [t_;, 11], 
and if it is satisfied for tf = r,, then it is satisfied in [t;, t,41], since 


ert!) =— er ALF (O/ Fi) _ Fy _ f(t). 
f(t) 

Thus (4) is satisfied in [—7, 7] by induction, and the theorem is proved for 
such an interval. To prove it for (—0oo, 00) let us observe that, having defined 
A in [—n,n], we can extend it to [—n — 1,n + 1] with the previous method, 
by dividing [n, n + 1}. for example, into small equal parts whose length must 
be chosen dependent on n at each stage (why?). The continuity of A is clear 
from the construction. 

To prove the uniqueness of A, suppose that A’ has the same properties as 
A. Since both satisfy equation (4), it follows that for each rt, there exists an 
integer m(t) such that 

A(t) — A'(t) = 2ni mit). 


The left side being continuous in f, m(-) must be a constant (why?), which 


must be equal to m(0) = 0. Thus A(t) = A’(t). 


Remark. It may not be amiss to point out that A(t) is a single-valued 
function of ¢ but not of f(t); see Exercise 7 below. 


Theorem 7.6.3. For a fixed T, let each , f,k => 1, as well as f satisfy the 
conditions for f in Theorem 7.6.2, and denote the corresponding 4 by ;A. 
Suppose that ,f converges uniformly to f in [—T,7], then ,A converges 
uniformly to A in [—7. T]. 


PROOF. Let L be as in (7), then there exists a6,0 <6 < s such that 
L(z)| < 1, if |z—1] <6. 


By the hypothesis o: uniformity, there exists k,(T) such that if k > k,(T), 
then we have 

xf (t) 
FO) 


rit 
L (ie ’) <1 

f(t) 
Since for each t, the exponentials of ,A(t) — A(t) and LG. f (1)/ f (£)) are equal, 
there exists an intege:-valued function ,m(t), |t| < 7, such that 


(8) sup _ 1 <4, 
i<T 


and consequently 


(9) sup 
lt{<T 


(10) L ( - ) =; A(t) —ACQ) + 2nigm(t), (t| < T. 


[ 
\.l 


7.6 INFINITE DIVISIBILITY | 255 


Since L is continuous in |z— 1| < 4, it follows that ,m/(-) is continuous in 
|t| < T. Since it is integer-valued and equals 0 at = 0, it is identically zero. 
Thus (10) reduces to 


"| <7. 


(11) ae ant (EO), 


f@ 


The function L being continuous at z = 1, the uniform convergence of , f/f 
to 1 in [—T, T] implies that of ,A — A to 0, as asserted by the theorem. 


Thanks to Theorem 7.6.1, Theorem 7.6.2 is applicable to each infinitely 
divisible ch.f. f in (—oo, 00). Henceforth we shall call the corresponding A 
the distinguished logarithm, and e*/" the distinguished nth root of f. We 
can now extract the correct mth root in (1) above. 


Theorem 7.6.4. For each n, the f, in (1) is just the distinguished nth root 
of f. 


PROOF. It follows from Theorem 7.6.1 and (1) that the ch.f. f, never 
vanishes in (—0o, oo), hence its distinguished logarithm 4, is defined. Taking 
multiple-valued logarithms in (1), we obtain as in (10): 


Vt: A(t) — nA, (t) = 27im, (t), 


where m,(-) takes only integer values. We conclude as before that m,(-) = 0, 
and consequently 


(12) frlt) = ein(t)  gdlt)/n 


as asserted. 


Corollary. If f is a positive infinitely divisible ch.f., then for every ¢ the 
f(t) in (1) is just the real positive nth root of f(¢). 


PROOF. Elementary analysis shows that the real-valued logarithm of a 
real number x in (0, oo) is a continuous function of x. It follows that this is 
a continuous solution of equation (4) in (—oo, oo). The uniqueness assertion 
in Theorem 7.6.2 then identifies it with the distinguished logarithm of f, and 
the corollary follows, since the real positive nth root is the exponential of 1/n 
times the real logarithm. 


As an immediate consequence of (12), we have 
Vr: lim f,(t) = 1. 
nwo 


Thus by Theorem 7.1.1 the double array {X,j,1 <j <n,1 n} giving rise 
to (2) is holospoudic. We have therefore proved that each infinitely divisible 
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distribution can be obtained as the limiting distribution of S,, = ei Xyj in 
such an array. 

It is trivial that the product of two infinitely divisible ch.f.’s is again such 
a one, for we have in obvious notation: 


if a a = Ghn)” ° fn)" = GF n 2 Fak s 


The next proposition lies deeper. 


Theorem 7.6.5. Let {;f,k > 1} be a sequence of infinitely divisible ch.f.’s 
converging everywhere to the ch.f. f. Then f is infinitely divisible. 


proor. The difficulty is to prove first that f never vanishes. Consider, as 
in the proof of Theorem 7.6.1: g = |f|*,.2 = | f|?. For each n > 1, let x!/” 
denote the real positive nth root of a real positive x. Then we have, by the 
hypothesis of convergence and the continuity of x'/" as a function of x, 


(13) Virfeg(t)}'" > [ga]. 


By the Corollary to Theorem 7.6.4, the left member in (13) is a ch.f. The right 
member is continuous everywhere. It follows from the convergence theorem 
for ch.f.’s that [g(-)]!/" is a ch.f. Since g is its nth power, and this is true for 
each n > 1, we have proved that g is infinitely divisible and so never vanishes. 
Hence f never vanishes and has a distinguished logarithm A defined every- 
where. Let that of ; f be ,A. Since the convergence of , f to f is necessarily 
uniform in each finite interval (see Sec. 6.3), it follows from Theorem 7.6.3 
that ,A — A everywhere, and consequently 


(14) exp(A(t)/n) > exp(A(t)/n) 


for every t. The left member in (14) is a ch.f. by Theorem 7.6.4, and the right 
member is continuous by the definition of A. Hence it follows as before that 
e9/" is a ch.f. and f is infinitely divisible. 

The following alternative proof is interesting. There exists a 6 > 0 such 
that f does not vanish for |t| <6, hence A is defined in this interval. For 
each n, (14) holds uniformly in this interval by Theorem 7.6.3. By Exercise 6 
of Sec. 6.3, this is sufficient to ensure the existence of a subsequence from 
{exp(,A(t)/n), k => 1} converging everywhere to some ch.f. y,. The nth power 
of this subsequence then converges to (y,,)”, but being a subsequence of {; f} 
it also converges to f. Hence f = (y,)”, and we conclude again that f is 
infinitely divisible. 


Using the preceding theorem, we can construct a wide class of ch-f.’s 
that are infinitely divisible. For each a and real u, the function 


(15) P(tra, u) = ete"-D 
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is an infinitely divisible ch.f., since it is obtained from the Poisson ch.f. with 
parameter a by substituting ur for t. We shall call such a ch.f. a generalized 
Poisson chf. A finite product of these: 


k k 
(16) [[ B4;, uj) = exp |S aj(e™ - 1) 
j=! j=l 

is then also infinitely divisible. Now if G is any bounded increasing function, 
the integral aaa (e'"™ — 1)dG(u) may be approximated by sums of the kind 
appearing as exponent in the right member of (16), for all t in ®! and indeed 
uniformly so in every finite interval (why?). It follows that for each such G, 
the function 


(17) f(t) = exp ee (e"™ — 1)dG(u) 


is an infinitely divisible ch.f. Now it turns out that although this falls some- 
what short of being the most general form of an infinitely divisible ch/f., 
we have nevertheless the following qualititive result, which is a complete 
generalization of (16). 


Theorem 7.6.6. For each infinitely divisible ch.f. f, there exists a double 
array of pairs of real constants (@nj;,Unj),1 <j <kn,1 <n, where a; > 0, 
such that 


kn 
(18) FO = lim [] BC anj, un). 
j=l 


The converse is also true. Thus the class of infinitely divisible d.f.’s coincides 
with the closure, with respect to vague convergence, of convolutions of a finite 
number of generalized Poisson d_f.’s. 


PROOF. Let f and f, be as in (1) and let A be the distinguished logarithm 
of f, F, the d.f. corresponding to f,. We have for each rf, as n —> oo: 


ni fa(t)— 1 = nfo" — 1] > 20) 
and consequently 
(19) etlfa-Il _, eh) = F(z), 


Actually the first member in (19) is a ch.f. by Theorem 6.5.6, so that the 
convergence is uniform in each finite interval, but this fact alone is neither 
necessary nor sufficient for what follows. We have 


AOS / ted Ge. 
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For each n, nF, is a bounded increasing function, hence there exists 
{Qn j, Un js ] = J < Kn} 


where —-00 < Un} < Un2 < + +> < Ung, < 00 and a,j = n[ Fn (Un) — Fn (Un j-1)), 
such that 


ky ioe) 
. 1 
20 Sei = 1 u-f iu _ 1)ndF,,(u)| < —. 
(20) sup (e Jan j “aot )n (u)} < - 


[t\l<n j=) 

(Which theorem in Chapter 6 implies this?) Taking exponentials and using 
the elementary inequality |e* — e@| < Je?|(e%-<! — 1), we conclude that as 
n—> Oo, 


ky 
(21) sup jefe O-N _ [[ 8: an, Unj)| =O @ . 


[t}<n j=l n 


This and (19) imply (18). The converse is proved at once by Theorem 7.6.5. 


We are now in a position to state the fundamental theorem on infinitely 
divisible ch.f.’s, due to P. Lévy and Khintchine. 


Theorem 7.6.7. Every infinitely divisible ch.f. f has the following canonical 
representation: 


. fe itu itu 1+u2 
f(t) =exp airs G —1 - 5) a ase 


where a is a real constant, G is a bounded increasing function in (—oo, oo), 
and the integrand is defined by continuity to be —r7/2 at u = 0. Furthermore, 
the class of infinitely divisible ch.f.’s coincides with the class of limiting ch.f.’s 
of ey Xnj ~ @ in a holospoudic double array 


{Xnj, 1 < j < ky, 1 <n}, 


where k, —> oo and for each n, the r.v.’s {X,;, 1 < j < ky} are independent. 


Note that we have proved above that every infinitely divisible ch.f. is in 
the class of limiting ch.f.’s described here, although we did not establish the 
canonical representation. Note also that if the hypothesis of “holospoudicity” 
is omitted, then every ch.f. is such a limit, trivially (why?). For a complete 
proof of the theorem, various special cases, and further developments, see the 
book by Gnedenko and Kolmogorov [12]. 
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Let us end this section with an interesting example. Puts = a+ it,0> 1 
and ¢ real; consider the Riemann zeta function: 


oo 1 1 —1 

w= =I] (1-5) ; 
where p ranges over all prime numbers. Fix o > | and define 
f(o + it) 


(0) 


We assert that f is an infinitely divisible ch.f. For each p and every real f, 
the complex number 1 — p~°~" lies within the circle {z: |z—1| < s}. Let 
log z denote that determination of the logarithm with an angle in (—z, zt]. By 
looking at the angles, we see that 


f@= 


1— pu? | 
log ee = log(1 — p~?) — log(1 — p-?*) 


= 

a » (ere er ne 1) 

mpm? 
m=) 


oO 
= > log P(t; m~' p~™°, ~m log p). 


in=1 


Since 
F(t) = jim [T] Bom! p-”", —miog p), 


psn 


it follows that f is an infinitely divisible ch.f. 
So far as known, this famous relationship between two “big names” has 
produced no important issue. 


EXERCISES 


1. Is the convex combination of infinitely divisible ch.f.’s also infinitely 
divisible? 

2. If f is an infinitely divisible ch.f. and A its distinguished logarithm, 
r > 0, then the rth power of f is defined to bee”, Prove that for each r > 0 
it is an infinitely divisible ch_f. 

*3. Let f be a ch.f. such that there exists a sequence of positive integers 
ny, going to infinity and a sequence of ch.f.’s g, satisfying f = (y,)"*; then 
f is infinitely divisible. 

4. Give another proof that the right member of (17) is an infinitely divis- 
ible ch.f. by using Theorem 6.5.6. 
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5. Show that f(r) = (1 — b)/(1 — be"), 0 < b < 1, is an infinitely divis- 
ible ch.f. [Hmnr: Use canonical form.] 

6. Show that the d.f. with density B°P(a)7!x*-'e-P*, ~@ > 0, B> 0, in 
(0, oo), and 0 otherwise, is infinitely divisible. 

7. Carry out the proof of Theorem 7.6.2 specifically for the “trivial” but 
instructive case f(t) = e”’, where a is a fixed real number. 

*8. Give an example to show that in Theorem 7.6.3, if the uniformity of 
convergence of ;f to f is omitted, then ,A need not converge to A. [HINT: 
af (t) = exp{2mi(—1)kkr(1 + kt)7!}.] 

9. Let f) =1—14, fx) = 1 —t4+ (-1)itk7!, 0 < t <2,k > 1. Then 
fx never vanishes and converges uniformly to f in [0, 2]. Let ./f;, denote the 
distinguished square root of f, in [0, 2]. Show that ./f; does not converge 
in any neighborhood of t = 1. Why is Theorem 7.6.3 not applicable? [This 
example is supplied by E. Reich]. 

*10. Some writers have given the proof of Theorem 7.6.6 by apparently 
considering each fixed t and using an analogue of (21) without the “sup,,,<,,” 
there. Criticize this “quick proof’. [Hint: Show that the two relations 


Vt and Vm: lim umn (t) = Up (tf), 
ma CO 
Vr: lim u,,(t) = u(t), 
n> oO 


do not imply the existence of a sequence {m,,} such that 
Vt: lim Um,» (t) = u(t). 
noo 


Indeed, they do not even imply the existence of two subsequences {m,} and 
{n,} such that 
Vt: Lim Un, (f) = u(t). 
Vv ©O 


Thus the extension of Lemma 1] in Sec. 7.2 is false.] 
The three “counterexamples” in Exercises 7 to 9 go to show that the 
cavalierism alluded to above is not to be shrugged off easily. 
11. Strengthening Theorem 6.5.5, show that two infinitely divisible 
ch.f.’s may coincide in a neighborhood of 0 without being identical. 

*12. Reconsider Exercise 17 of Sec. 6.4 and try to apply Theorem 7.6.3. 
[HINT: The latter is not immediately applicable, owing to the lack of uniform 
convergence. However, show first that if e°’ converges for t € A, where 
m(A) > 0, then it converges for all t. This follows from a result due to Stein- 
haus, asserting that the difference set A — A contains a neighborhood of 0 (see, 
e.g., Halinos [4, p. 68]), and from the equation eer!" = efil'+") Let {bn}, 
(b’} be any two subsequences of cp, then e@*~%:)" —» 1 for all t. Since 1 is a 


ce eee EL 
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ch.f., the convergence is uniform in every finite interval by the convergence 
theorem for ch.f.’s. Alternatively, if 


g(t) = lim e" 
n>Ce 


then ¢ satisfies Cauchy’s functional equation and must be of the form e“’, 
which is a ch.f. These approaches are fancier than the simple one indicated 
in the hint for the said exercise, but they are interesting. There is no known 
quick proof by “taking logarithms”, as some authors have done.] 
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$ Random walk 


8.1 Zero-or-one laws 


In this chapter we adopt the notation N for the set of strictly positive integers, 
and N° for the set of positive integers; used as an index set, each is endowed 
with the natural ordering and interpreted as a discrete time parameter. Simi- 
larly, for each n € N, N,, denotes the ordered set of integers from 1 to n (both 
inclusive); Ne that of integers from 0 to n (both inclusive); and Nj, that of 
integers beginning with n + 1. 

On the probability triple (Q, 4%, P), a sequence {X,, n € N} where each 
X, 1s anr.v. (defined on Q and finite a.e.), will be called a (discrete parameter) 
stochastic process. Various Borel fields connected with such a process will now 
be introduced. For any sub-B.F. & of #, we shall write 


(1) Xeg 


and use the expression “X belongs to 4” or “& contains X” to mean that 
X71(B) C F (see Sec. 3.1 for notation): in the customary language X is said 
to be “measurable with respect to “”. For each n € N, we define two B.F.’s 
as follows: 
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, = the augmented B.F. generated by the family of r.v.’s {X,,k € Ny}; 
that is, 4, 1s the smallest B.F./ containing all X,in the family 
and all null sets; 

#, = the augmented B.F. generated by the family of r.v.’s {X,,k € Ni}. 


Recall that the union UP], % is a field but not necessarily a B.F. The 
smallest B.F. containing it, or equivalently containing every 4, n € N, is 


denoted by 


[ee] 
Ge a Be 
Fon \ Gs 


it is the B.F. generated by the stochastic process {X,,n € N}. On the other 
hand, the intersection (\j_, Z is a B.F. denoted also by AP, &. It will be 
called the remote field of the stochastic process and a member of it a remote 
event. 

Since A C ¥, F is defined on AQ. For the study of the process {X,,n € 
N} alone, it is sufficient to consider the reduced triple (2, Ao, F |g). The 
following approximation theorem is fundamental. 


Theorem 8.1.1. Given ¢>0 and AeA, there exists Ae€ ULLA 
such that 


2) P(AAA,) < €. 


PROOF. Let & be the collection of sets A for which the assertion of the 
theorem is true. Suppose A; € & for eachk € N and Ax t A or Ay | A. Then 
A also belongs to &, as we can easily see by first taking k large and then 
applying the asserted property to A,. Thus ¥ is a monotone class. Since it is 
trivial that ¥ contains the field UU, A, that generates A, F must contain 
7x. by the Corollary to Theorem 2.1.2, proving the theorem. 

Without using Theorem 2.1.2, one can verify that # is closed with respect 
to complementation (trivial), finite union (by Exercise 1 of Sec. 2.1), and 
countable union (as increasing limit of finite unions). Hence & is a BF. that 
must contain Ap. 


It will be convenient to use a particular type of sample space &2. In the 
noxation of Sec. 3.4, let 


fo 6) 
= K Qn: 


n=l 


where each Q, is a “copy” of the real line R!. Thus Q is just the space of all 
in‘inite sequences of real numbers. A point w will be written as {w,,n € N}, 
and w, as a function of w will be called the nth coordinate (function) of w. 
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Each 2, is endowed with the Euclidean B.F. 4’, and the product Borel field 
# (= Fy, in the notation above) on Q is defined to be the B.F. generated by 
the finite-product sets of the form 


k 


(3) (\{o:@n, € Bn,) 
j=l 
where (71, ..., 4) 18 an arbitrary finite subset of N and where each Bn, € B 


In contrast to the case discussed in Sec. 3.3, however, no restriction is made on 
the p.m. 7 on *. We shall not enter here into the matter of a general construc- 
tion of 7. The Kolmogorov extension theorem (Theorem 3.3.6) asserts that 
on the concrete space-field (QQ, 7) just specified, there exists a p.m. Y whose 
projection on each finite-dimensional subspace may be arbitrarily preassigned, 
subject only to consistency (where one such subspace contains the other). 
Theorem 3.3.4 is a particular case of this theorem. 

In this chapter we shall use the concrete probability space above, of the 
so-called “function space type”, to simplify the exposition, but all the results 
below remain valid without this specification. This follows from a measure- 
preserving homomorphism between an abstract probability space and one of 
the function-space type; see Doob [17, chap. 2]. 

The chief advantage of this concrete representation of Q is that it enables 
us to define an important mapping on the space. 


DEFINITION OF THE SHIFT. The shift t is a mapping of © such that 
T.@ = {w,,n €N} > tw = {@n41,n €N}; 


in other words, the image of a point has as its nth coordinate the (n + 1)st 
coordinate of the original point. 


Clearly t is an co-to-1 mapping and it is from Q onto Q. Its iterates 
are defined as usual by composition: r° = identity, r& = r° t*! fork > 1. It 
induces a direct set mapping t and an inverse set mapping t~! according to 
the usual definitions. Thus 


cA = {w: tw € A} 


and t~” is the nth iterate of t~!. If A is the set in (3), then 
k 
(4) tA =(\{o:onj+1 € Bu,)- 
j=l 


It follows from this that r~! maps # into %; more precisely, 


VA EF: 0 "ACF, nen, 


nm? 
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where ~%,' is the Borel field generated by {w,, k > n}. This is obvious by (4) 
if A is of the form above, and since the class of A for which the assertion 
holds is a B.F., the result is true in general. 


DEFINITION. A set in ¥ is called invariant (under the shift) iff A = tA. 
Anr.v. Y on Q is invariant iff Y(w) = Y(tw) for every w € Q. 


Observe the following general relation, valid for each point mapping + 
and the associated inverse set mapping t~!, each function Y on Q and each 
subset A of -A!: 


(5) t '{w: ¥(w) € A} = {w: ¥(tw) € A}. 


This follows from t=! 0 Y~! = (Yo r)7!, 
We shall need another kind of mapping of 2. A permutation on N,, is a 
1-to-1 mapping of N,, to itself, denoted as usual by 


1; De bs eos n 

ol, O26. 05 on )~ 
The collection of such mappings forms a group with respect to composition. 
A finite permutation on N is by definition a permutation on a certain “initial 
segment” ’,, of N. Given such a permutation as shown above, we define ow 


to be the point in {2 whose coordinates are obtained from those of w by the 
corresponding permutation, namely 


J @sj, if jE Nn; 
EN oe if jeN!. 


As usual. o induces a direct set mapping o and an inverse set mapping o7!, 
the latter being also the direct set mapping induced by the “group inverse” 
ao! of o. In analogy with the preceding definition we have the following. 


DEFINITION. A set in # is called permutable iff A = oA for every finite 
permutation o on N. A function Y on Q is permutable iff Y(w) = Y(ow) for. 
every finite permutation o and every w € Q. 


It is fairly obvious that an invariant set is remote and a remote set is 
permutable: also that each of the collections: all invariant events, all remote 
events, all permutable events, forms a sub-B.F. of .7. If each of these B.F.’s 
is augmented (see Exercise 20 of Sec. 2.2), the resulting augmented B.F.’s 


will be called “almost invariant’, “almost remote”, and “almost permutable”, 
respectively. Finally, the collection of all sets in :¥ of probability either 0 or 
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1 clearly forms a B.F., which may be called the “all-or-nothing” field. This 
B.F. will also be referred to as “almost trivial”. 

Now that we have defined all these concepts for the general stochastic 
process, it must be admitted that they are not very useful without further 
specifications of the process. We proceed at once to the particular case below. 


DEFINITION. A sequence of independent r.v.’s will be called an indepen- 
dent process; it is called a stationary independent process iff the r.v.’s have a 
common distribution. 


Aspects of this type of process have been our main object of study, 
usually under additional assumptions on the distributions. Having christened 
it, we shall henceforth focus our attention on “the evolution of the process 
as a whole” — whatever this phrase may mean. For this type of process, the 
specific probability triple described above has been constructed in Sec. 3.3. 
Indeed, 7 = A, and the sequence of independent r.v.’s is just that of the 
successive coordinate functions {w,,n € N}, which, however, will also be 
interchangeably denoted by {X,,,n € N}. If g is any Borel measurable func- 
tion, then {g(X,,), n € N} is another such process. 

The following result is called Kolmogorov’s “zero-or-one law”. 


Theorem 8.1.2. For an independent process, each remote event has proba- 
bility zero or one. 


PROOF. Let A €{)>-.,%, and suppose that 7(A) > 0; we are going to 
prove that A(A) = 1. Since %, and 4%! are independent fields (see Exercise 5 
of Sec. 3.3), A is independent of every set in %, for each n € N; namely, if 
Me UU, %, then 


(6) PIANM) = PA(A)AM). 
If we set 
Pr (M) = P(A NM) 
aN PAY 


for M € F, then 7, (-) is clearly a p.m. (the conditional probability relative to 
A; see Chapter 9). By (6) it coincides with 7 on LU>_, % and consequently 
also on # by Theorem 2.2.3. Hence we may take M to be A in (6) to conclude 
that (A) = PAY or P(A) = 1. 


The usefulness of the notions of shift and permutation in a stationary 
independent process is based on the next result, which says that both t! and 
o are “measure-preserving”’. 
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Theorem 8.1.3. For a stationary independent process, if A € ¥ and o is 
any finite permutation, we have 


(7) P(t) = P(A); 
(8) P(A) = P(A). 
PROOF. Define a set function 7 on F as follows: 
P(A) = A(t! A). 
Since t~! maps disjoint sets into disjoint sets, it is clear that P is a p.m. For 
a finite-product set A, such as the one in (3), it follows from (4) that 


k 
PA) = |] uBn) = PA). 


j=l 


Hence # and coincide also on each set that is the union of a finite number 
of such disjoint sets, and so on the B.F. ¥ generated by them, according to 
Theorem 2.2.3. This proves (7); (8) is proved similarly. 


The following companion to Theorem 8.1.2, due to Hewitt and Savage, 
is very useful. 


Theorem 8.1.4. Fora stationary independent process, each permutable event 
has probability zero or one. 


PROOF. Let A be a permutable event. Given € > 0, we may choose e€, > 0 


so that 
lo) 
) ep SE. 
k=l 


By Theorem 8.1.1, there exists Ay € 4%, such that P(A A Ay) < &, and we 
may suppose that nz ¢ oo. Let 


o= 1,...,mp,netl,...,2ny 
~ netl,..., 2k, l,...,m, / 


and M, = oA,. Then clearly My; € #,.. It follows from (8) that 
P(N AM,) < €. 
For any sequence of sets {F,} in *, we have 


[e @) [eo @) [o.@) 
P(lim sup E,) = P (n U E:) < S| PE): 
k k=l 


m=1k=m = 
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Applying this to E; = A AMg, and observing the simple identity 
lim sup(A a M;) = (A\ liminf M;) U (lim sup M;\A), 
we deduce that 


P(A alim sup M;) < A(lim sup(A a M,)) < €. 


~~ #'. which is seen 


Since Upnm Mx € A, the set lim sup M, belongs to ()72,, %. 
to coincide with the remote field. Thus lim sup M, has probability zero or one 
by Theorem 8.1.2, and the same must be true of A, since € is arbitrary in the 
inequality above. 

Here is a more transparent proof of the theorem based on the metric on 
the measure space (2, ¥%, #) given in Exercise 8 of Sec. 3.2. Since A, and 


M, are independent, we have 
P(AgOMg) = P(AR)ACMg). 
Now A; — A and M,; — A in the metric just mentioned, hence also 
ApNM7>ANA 


in the same sense. Since convergence of events in this metric implies conver- 
gence of their probabilities, it follows that P(A N A) = A(A)P(A), and the 
theorem is proved. 


Corollary. For a stationary independent process, the B.F.’s of almost 
permutable or almost remote or almost invariant sets all coincide with the 
all-or-nothing field. 


EXERCISES 
Q and # are the infinite product space and field specified above. 


1. Find an example of a remote field that is not the trivial one; to make 
it interesting, insist that the r.v.’s are not identical. 

2. Anr.v. belongs to the all-or-nothing field if and only if it is constant 
ae. 

3. If A is invariant then A = tA; the converse is false. 

4. An r.v. is invariant {permutable] if and only if it belongs to the 
invariant [permutable] field. 

5. The set of convergence of an arbitrary sequence of r.v.’s {¥,,n € N} 
or of the sequence of their partial sums i=l Y; are both permutable. Their 
limits are permutable r.v.’s with domain the set of convergence. 


a I RI 


270 | RANDOM WALK 


*6. Ifa, > 0, lim, oo dn exists > 0 finite or infinite, and lim, — 50(@n41/Qn) 
= I, then the set of convergence of {a,,~! ae Y ;} is invariant. If a, — +00, 
the upper and lower limits of this sequence are invariant r.v.’s. 

*7, The set {Y2, € A i.o.}, where A € #!, is remote but not necessarily 
invariant; the set int Y; Ai.o.} is permutable but not necessarily remote. 
Find some other essentially different examples of these two kinds. 


8. Find trivial examples of independent processes where the three 
numbers 7(t~'A), P(A), P(tA) take the values 1, 0, 1; or 0, 4, 1. 

9. Prove that an invariant event is remote and a remote event is 
permutable. 


*10. Consider the bi-infinite product space of all bi-infinite sequences of 
real numbers {w,,n € N}, where W is the set of all integers in its natural 
(algebraic) ordering. Define the shift as in the text with N replacing N, and 
show that it is 1-to-1 on this space. Prove the analogue of (7). 

11. Show that the conclusion of Theorem 8.1.4 holds true for a sequence 
of independent r.v.’s, not necessarily stationary, but satisfying the following 
condition: for every j there exists a k > j such that X; has the same distribu- 
tion as X,;. [This remark is due to Susan Horn.] 

12. Let {X,,,n > 1} be independent r.v.’s with P{X, =4-"} = A{X, = 
—4-"} = 7 Then the remote field of {S,,,n > 1}, where S, = Se Xj, is not 
trivial. 


8.2 Basic notions 


From now on we consider only a stationary independent process {X,,n € 
N} on the concrete probability triple specified in the preceding section. The 
common distribution of X,, will be denoted by yu (p.m.) or F (d.f.); when only 
this is involved, we shall write X for a representative X,,, thus &(X) for &(X,,). 

Our interest in such a process derives mainly from the fact that it under- 
lies another process of richer content. This is obtained by forming the succes- 
sive partial sums as follows: 


(1) Sn = S Xj, nen. 
j=l 


An initial r.v. So = 0 is adjoined whenever this serves notational convenience, 
as in X, =S, —Sp-1 for n € N. The sequence {S,,n € N} is then a very 
familiar object in this book, but now we wish to find a proper name for it. An 
officially correct one would be “stochastic process with stationary independent 
differences”; the name “homogeneous additive process” can also be used. We 


8.2 BASIC NOTIONS | 271 


have, however, decided to call it a “random walk (process)”, although the use 
of this term is frequently restricted to the case when yz is of the integer lattice 
type or even more narrowly a Bernoullian distribution. 


DEFINITION OF RANDOM WALK. A random walk is the process {S,,n € N} 
defined in (1) where {X,,,n €N} is a stationary independent process. By 
convention we set also So = 0. 

A similar definition applies in a Euclidean space of any dimension, but 
we shall be concerned only with #! except in some exercises later. 

Let us observe that even for an independent process {X,,n € N}, its 
remote field is in general different from the remote field of {S,,,n € N}, where 
Oyo ne. « j- They are almost the same, being both almost trivial, for a 
stationary independent process by virtue of Theorem 8.1.4, since the remote 
field of the random walk is clearly contained in the permutable field of the 
corresponding stationary independent process. 

We add that, while the notion of remoteness applies to any process, 
“(shift)-invariant” and “permutable” will be used here only for the underlying 
“coordinate process” {w,,n € N} or {X,,n € N}. 

The following relation will be much used below, for m <n: 

Sn—m(t™@) = DX j(e"w) = S71 X jam(@) = Sn) ~ Sn). 
j=l J=1 
It follows from Theorem 8.1.3 that S,_,, and S, —S,, have the same distri- 
bution. This is obvious directly, since it is just w°~-”™™*. 

As an application of the results of Sec. 8.1 to a random walk, we state 

the following consequence of Theorem 8.1.4. 


Theorem 8.2.1. Let B, ¢ #! for each n € N. Then 
P{Sn € By i.o.} 


is equal to zero or one. 


PROOF. If o is a permutation on N,,, then S,,(ow) = S,(@) for n = m, 
hence the set 


foe) 
Am = LHS: € Bn} 


n=m 
is unchanged under ao! or o. Since A,, decreases as m increases, it 


follows that 
[o4) 
(] An 


m=} 


is permutable, and the theorem is proved. 


en LC LE CL CC 
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Even for a fixed B = B,, the result is significant, since it is by no means 
evident that the set {S,, > 01i.0.}, for instance, is even vaguely invariant 
or remote with respect to {X,,n €.N} (cf. Exercise 7 of Sec. 8.1). Yet the 
preceding theorem implies that it is in fact almost invariant. This is the strength 
of the notion of permutability as against invariance or remoteness. 

For any serious study of the random walk process, it is imperative to 
introduce the concept of an “optional r.v.” This notion has already been used 
more than once in the book (where?) but has not yet been named. Since the 
basic inferences are very simple and are supposed to be intuitively obvious, 
it has been the custom in the literature until recently not to make the formal 
introduction at all. However, the reader will profit by meeting these funda- 
mental ideas for the theory of stochastic processes at the earliest possible time. 
They will be needed in the next chapter, too. 


DEFINITION OF OPTIONAL r.v. Anr.v. @ is called optional relative to the 
arbitrary stochastic process {Z,,n € N} iff it takes strictly positive integer 
values or +00 and satisfies the following condition: 


(2) Vn EN U{oo}: {wi a(w) =n} Ee&, 


where 4, 1s the B.F. generated by {Z;,k € Nj}. 

Similarly if the process is indexed by N° (as in Chapter 9), then the range 
of w will be N°. Thus if the index n is regarded as the time parameter, then 
a effects a choice of time (an “option”) for each sample point w. One may 
think of this choice as a time to “stop”, whence the popular alias “stopping 
time”, but this is usually rather a momentary pause after which the process 
proceeds again: time marches on! 

Associated with each optional r.v. a there are two or three important 
objects. First, the pre-a field %, is the collection of all sets in A of the form 


(3) LJ a =n}N An), 


1<n<co 


where A, € 4%, for each n € N U {ow}. This collection is easily verified to be 
a B.F. (how?). If A € &, then we have clearly AN {a =n} € &, for every 
n. This property also characterizes the members of “% (see Exercise 1 below). 
Next, the post-a~ process is the process | {Zy4,,n € N} defined on the trace 
of the original probability triple on the set {a < oo}, where 


(4) Yn e Ni Laan (@) = Latin (w). 


Each Zy+n is seen to be an r.v. with domain {a < 00}; indeed it is finite a.e. 
there provided the original Z,,’s are finite a.e. It is easy to see that Z, € %. 
The post-a field #; is the B.F. generated by the post-a process: it is a sub-B.F. 
of {a < co} NH. 
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Instead of requiring @ to be defined on all Q but possibly taking the value 
oo, we may suppose it to be defined on a set A in “Q. Note that a strictly 
positive integer n is an optional r.v. The concepts of pre-a and post-a fields 
reduce in this case to the previous 4%, and 4. 

A vital example of optional r.v. is that of the first entrance time into a 
given Borel set A: 


ae eee U to:Zn(w) € A) 


+00 elsewhere. 


To see that this is optional, we need only observe that for each n € N: 
{w: a4(@) =n} = {w:Z;(w) € AS, 1 < j <n —1;5Z,(@) € A} 


which clearly belongs to ¥%,; similarly for n = oo. 

Concepts connected with optionality have everyday counterparts, implicit 
in phrases such as “within thirty days of the accident (should it occur)”. Histor- 
ically, they arose from “gambling systems”, in which the gambler chooses 
opportune times to enter his bets according to previous observations, experi- 
ments, or whatnot. In this interpretation, w + 1 is the time chosen to gamble 
and is determined by events strictly prior to it. Note that, along with a,a+ 1 
is also an optional r.v., but the converse is false. 

So far the notions are valid for an arbitrary process on an arbitrary triple. 
We now return to a stationary independent process on the specified triple and 
extend the notion of “shift” to an “a-shift” as follows: t* is a mapping on 
{a < co} such that 


(6) Tw=t'w on {w:a(w) =n}. 


Thus the post-@ process is just the process {X,,(t%w), n € N}. Recalling that 
X, iS a mapping on Q, we may also write 


(7) Xotn(w) = X_(t%w) = (Xp © T*)(o) 


and regard X, ° t%, n EN, as the r.v.’s of the new process. The inverse set 
mapping (t“)~', to be written more simply as t~%, is defined as usual: 


tT “A = {w:t*w € A}. 


Let us now prove the fundamental theorem about “stopping” a stationary 
independent process. 


Theorem 8.2.2. For a stationary independent process and an almost every- 
where finite optional r.v. a relative to it, the pre-a@ and post-o fields are inde- 
pendent. Furthermore the post-a@ process is a stationary independent process 
with the same common distribution as the original one. 
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PROOF. Both assertions are summarized in the formula below. For any 
A €:%,k EN, B; € ZB’, 1 <j <k, we have 


k 
(8) P{A;Xorj €By1< j<k}= AA} ][ u@)). 


j=l 
To prove (8), we observe that it follows from the definition of a and % that 
(9) AN{a=ns=A,Nfa=nlbe&R, 
where A, € 4% for each n € N. Consequently we have 
P{Ara=njXq4j, EB 1S fk} =Al{Anja = nj Xn4j; € Bj, 1 <j <k} 
= P{A;a =njA{Xn4; € Bj, 1 <j sk} 
k 
= P{A;a =n} |] u(B)), 
j=l 


where the second equation is a consequence of (9) and the independence of 
Y, and A’. Summing over n € N, we obtain (8). 


An immediate corollary is the extension of (7) of Sec. 8.1 to an a-shift. 
Corollary. For each A € & we have 
(10) P(t *A)= P(A). 


Just as we iterate the shift t, we can iterate t~. Put a! = a, and define 
a inductively by 
a 1(@) = a (rw), KEN. 


Each « is finite a.e. if a is. Next, define By = 0, and 
k 
Be= Sool, kEN. 
j=l 
We are now in a position to state the following result, which will be 
needed later. 


Theorem 8.2.3. Let a be an a.e. finite optional r.v. relative to a stationary 
independent process. Then the random vectors {V;, k € N}, where 


Vi(w) = (a (w), Xp, 41(), -.-, Xp, (@)), 


are independent and identically distributed. 
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PROOF. The independence follows from Theorem 8.2.2 by our showing 
that V;,..., Vg_1 belong to the pre-f,_; field, while V; belongs to the post- 
By-1 field. The details are left to the reader; cf. Exercise 6 below. 

To prove that V; and V;4; have the same distribution, we may suppose 
that k = 1. Then for each n € N, and each n-dimensional Borel set A, we have 


{co: a? (cw) = 0; (Xqig1(@), «++ Xqtpar(w)) € A} 
= {w: a) (t%w) = n; (X1(t%a), ..., Xqi (tw) € A}, 
since 
Xgi(t*w) = Xqi(raw) (TW) = Xq2(w) (TW) 
= X ol (0)+02(w)(@) = Xe +a2(w) 
by the quirk of notation that denotes by Xq(-) the function whose value at 


w is given by Xqw)(@) and by (7) with n = a*(w). By (5) of Sec. 8.1, the 
preceding set is the t~*-image (inverse image under t*) of the set 


{w: oo! (w) = n; (X1(w), ...,Xqi(w)) € A}, 


and so by (10) has the same probability as the latter. This proves our assertion. 


Corollary. The r.v.’s {Y;,k € N}, where 


Bx 
¥.@o)= Y> G(X, @)) 


n=P-17" 


and g is a Borel measurable function, are independent and identically 
distributed. 


For y = 1, Y, reduces to a*. For g(x) = x, Y; = Sg, — Sp,_,. The reader 
is advised to get a clear picture of the quantities ak, Bx, and Y,; before 
proceeding further, perhaps by considering a special case such as (5). 

We shall now apply these considerations to obtain results on the “global 
behavior” of the random walk. These will be broad qualitative statements 
distinguished by their generality without any additional assumptions. 

The optional r.v. to be considered is the first entrance time into the strictly 
positive half of the real line, namely A = (0, 00) in (5) above. Similar results 
hold for [0, oo); and then by taking the negative of each X,, we may deduce 
the corresponding result for (—0o, 0] or (—00, 0). Results obtained in this way 
will be labeled as “dual” below. Thus, omitting A from the notation: 

[o.@) 


(11) ww) = min{n € N:S, > 0} on Ula: Sn(o) > 0}; 


+00 elsewhere; 
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and 
Vn €Ni{a=n}={S; <0 forl<j<n-—1;S, > O}. 


Define also the r.v. M, as follows: 


(12) Wn Ee N°: M,(w) = max 5;(0). 
Sjsn 


The inclusion of So above in the maximum may seem artificial, but it does 
not affect the next theorem and will be essential in later developments in the 
next section. Since each X,, is assumed to be finite a.e., so is each S, and M,. 
Since M,, increases with n, it tends to a limit, finite or positive infinite, to be 
denoted by 


(13) M(w) = lim M,(@) = sup Sj(@). 


0<j<co 


Theorem 8.2.4. The statements (a), (b), and (c) below are equivalent; the 
statements (a’), (b’), and (c’) are equivalent. 


(a) P{a < +oo} = 1; (a) P{a < +00} < 1; 
(b) P{ lim. S, = +00} = 1; (b’) A lim S, = +00} = 0; 
(c) A{M =+o0} = 1; (c’) A{M = +00} = 0. 


pRoor. If (a) is true, we may suppose a < oo everywhere. Consider the 
r.V. Sy: it is strictly positive by definition and so 0 < &(S_) < +00. By the 
Corollary to Theorem 8.2.3, {Sp,,, —Sg,,k = 1} is a sequence of indepen- 
dent and identically distributed r.v.’s. Hence the strong law of large numbers 
(Theorem 5.4.2 supplemented by Exercise 1 of Sec. 5.4) asserts that, if w® = 0 
and S° = 0: 


Sp, 1 n—-1 
so dS —S,) > &Sy)>O ae. 


This implies (b). Since limy;—.o0 Sn <M, (b) implies (c). It is trivial that (c) 
implies (a). We have thus proved the equivalence of (a), (b), and (c). If (a’) 
is true, then (a) is false, hence (b) is false. But the set 
nam OO 

is clearly permutable (it is even invariant, but this requires a little more reflec- 
tion), hence (b’) is true by Theorem 8.1.4. Now any numerical sequence with 
finite upper limit is bounded above, hence (b’) implies (c’). Finally, if (c’) is 
true then (c) is false, hence (a) is false, and (a’) is true. Thus (a’), (b’), and 
(c’) are also equivalent. 
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Theorem 8.2.5. For the general random walk, there are four mutually exclu- 
sive possibilities, each taking place a.e.: 


(i) Vn EN: S, = 0; 
(ii) S, — —00; 
iii) S$, — +00; 
(iv) —0o = lim, 49 Sn < limy+oo Sn = +00. 


PROOF. If X = 0 ae., then (i) happens. Excluding this, let g; = lim, Sp. 
Then g; is a permutable r.v., hence a constant c, possibly -too, ae. by 
Theorem 8.1.4. Since 


lim S, = X,; + lim(S, — X1), 
n n 
we have g; =X; +@, where ~)(w) = 9) (tw) =c ae. Since X; #0, it 
follows that c = +00 or —oo. This means that 


either limS, = +00 or limS, = —oo. 
n n 
By symmetry we have also 


either limS, = —oo or limS, = +00. 


These double alternatives yield the new possibilities (ii), (ili), or (iv), other 
combinations being impossible. 
This last possibility will be elaborated upon in the next section. 


EXERCISES 
In Exercises 1-6, the stochastic process is arbitrary. 


*1. a is optional if and only if Wn € N: {a <n} eR. 
*2. For each optional w we have a € & and X, € &. If a and £ are both 
optional and a < f, then % CH. 
3. If a; and a2 are both optional, then so is a; A a2, a V @2, a; + a. If 
a is optional and A € &, then a, defined below is also optional: 


oe on A 
aa = 


*4. If a is optional and £ is optional relative to the post-a process, then 
a + B is optional (relative to the original process). 
5. Vk e N:a'+---+a* is optional. [For the a in (11), this has been 
called the kth ladder variable.] 
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*6. Prove the following relations: 
CSOT a eas KG per Naas. 
7. If a and # are any two optional r.v.’s, then 
X p+ j(T°@) = Xow) prew)+j(@)} 


(rP 0 r%)(w) = rhe) Fe) (4) = 744% (a) in general. 


*§. Find an example of two optional r.v.’s a and B such that a < f but 
4, PD Fp. However, if y is optional relative to the post-a@ process and 6B = 
a+y, then indeed 4% > 4%. As a particular case, ¥ is decreasing (while 
Ag, is increasing) as k increases. 

9. Find an example of two optional r.v.’s a and 6 such that a < B but 
fB — @ is not optional. 

10. Generalize Theorem 8.2.2 to the case where the domain of definition 
and finiteness of aw is A with 0 < P(A) < 1. [This leads to a useful extension 
of the notion of independence. For a given A in ¥ with P(A) > 0, two 
events A and M, where M Cc A, are said to be independent relative to A iff 
P{ANANM} = AA N AJP, {M}.] 

*11. Let {X,,n € N} be a stationary independent process and {o,, k € N} 
a sequence of strictly increasing finite optional r.v.’s. Then {Xo,+1, kK € N} is 
a stationary independent process with the same common distribution as the 
original process. [This is the gambling-system theorem first given by Doob in 
1936.] 

12. Prove the Corollary to Theorem 8.2.2. 

13. State and prove the analogue of Theorem 8.2.4 with @ replaced by 
(0,00). [The inclusion of 0 in the set of entrance causes a small difference.] 

14. In an independent process where all X,, have a common bound, 
{a} < co implies &{S,} < co for each optional @ [cf. Theorem 5.5.3]. 


8.3 Recurrence 


A basic question about the random walk is the range of the whole process: 
Ure, Sn(w) for a.e. w; or, “where does it ever go?” Theorem 8.2.5 tells us 
that, ignoring the trivial case where it stays put at 0, it either goes off to —oo 
or +00, or fluctuates between them. But how does it fluctuate? Exercise 9 
below will show that the random walk can take large leaps from one end to 
the other without stopping in any middle range more than a finite number of 
times. On the other hand, it may revisit every neighborhood of every point an 
infinite number of times. The latter circumstance calls for a definition. 
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DEFINITION. The number x € &! is called a recurrent value of the random 
walk {S,,n € N}, iff for every « > 0 we have 


(1) AA\S, —x| <€ioj=1. 


The set of all recurrent values will be denoted by ®. 


Taking a sequence of € decreasing to zero, we see that (1) implies the 
apparently stronger statement that the random walk is in each neighborhood 
of x 1.0. ae. 

Let us also call the number x a possible value of the random walk iff 
for every € > 0, there exists n € N such that P{|S,, — x| < €} > 0. Clearly a 
recurrent value is a possible value of the random walk (see Exercise 2 below). 


Theorem 8.3.1. The set ® is either empty or a closed additive group of real 
numbers. In the latter case it reduces to the singleton {0} if and only if X = 0 
a.e.; otherwise is either the whole #! or the infinite cyclic group prorated 
by a nonzero number c, namely {Anc:n € N°}. 


PROOF. Suppose  # ¢ throughout the proof. To prove that H is a group, 
let us show that if x is a possible value and y € ®, then y — x € . Suppose 
not; then there is a strictly positive probability that from a certain value of n 
on, S,, will not be in a certain neighborhood of y — x. Let us put for z € &!: 


(2) Pem(Z) = Pl\S, — z| = € for all n > m}; 


so that po, m(y — x) > 0 for some € > 0 and m EN. Since x is a possible 
value, for the same € we have a k such that A{|S; — x| < €} > 0. Now the 
two independent events |S; —x| <e¢ and |S, — S, — (y — x)| = 2e€ together 
imply that [S, — y| > €; hence 


(3) Pe, ktm(y) = PX|Sn _ yl = > € for all n Z k+ m} 
> P{\S, —x| < SAPS, — Sy — (y — X)| = 2e for all n > k +m}. 


The last-written probability is equal to p2em(y — x), since S, — S, has the 
same distribution as S,,_;. It follows that the first term in (3) is strictly positive, 
contradicting the assumption that y € 8. We have thus proved that {t is an 
additive subgroup of :#!. It is trivial that 9t as a subset of Z! is closed in the 
Euclidean topology. A well-known proposition (proof?) asserts that the only 
closed additive subgroups of :#' are those mentioned in the second sentence 
of the theorem. Unless X = 0 a.e., it has at least one possible value x 4 0, 
and the argument above shows that —x = 0 —x € 9 and consequently also 
x =0-—(—x) € ®. Suppose ® is not empty, then 0 € Nt. Hence ® is not a 
singleton. The theorem is completely proved. 
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It is clear from the preceding theorem that the key to recurrence is the 
value 0, for which we have the criterion below. 


Theorem 8.3.2. If for some € > 0 we have 


(4) So AlISnl < €} < 00, 
then 
(5) P{|Sn| < € 1.0.} = 0 


(for the same e€) so that 0 ¢ ®. If for every « > 0 we have 


(6) Y>AlISn| < €} = 00, 
then 
(7) P{|Sn| < €i.0.} = 1 


for every « > 0 and so0€ &. 


Remark. Actually if (4) or (6) holds for any € > 0, then it holds for 
every € > 0; this fact follows from Lemma 1 below but is not needed here. 


PROOF. The first assertion follows at once from the convergence part of 
the Borel—Cantelli lemma (Theorem 4.2.1). To prove the second part consider 


F = liminf{|S,,| > €}; 


namely F is the event that |S,,| < € for only a finite number of values of n. 
For each w on F, there is an m(w) such that |S,,(w)| > € for all n > m(w); it 
follows that if we consider “the last time that |S,| < €’, we have 


CO 
PF) = S~Pl\Sml < €|Sn| = € for all n > m+ 1}. 


m=0 


Since the two independent events |S,,| < € and |S, — S| > 2¢ together imply 
that |S,,| > €, we have 


lo @ 
1>PAF)>S°PUSml < PMS — Sm| = 2€ for all n > m+ 1) 


m=) 


=D) AUSml < €} P2e,1 (0) 


m=1 
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by the previous notation (2), since S, — S,, has the same distribution as S,_m. 
Consequently (6) cannot be true unless p2,\(0) = 0. We proceed to extend 
this to show that p2., = 0 for every k € N. To this aim we fix k and consider 
the event 

Am = {|Sm| < €, |S,| = € for all n > m+k}; 


then A,, and A, are disjoint whenever m' > m +k and consequently (why?) 


k> Pn). 


m=1 


The argument above for the case k = 1 can now be repeated to yield 


[o.@) 
k>S°PUSml < }P2,x(0), 


m=1 
and so p2.,x(0) = 0 for every € > 0. Thus 
PAF) = lim pex(0) =0 
k-> 0o 


which is equivalent to (7). 
A simple sufficient condition for 0 € , or equivalently for R 4 ¢, will 
now be given. 


Theorem 8.3.3. If the weak law of large numbers holds for the random walk 
{S,,n € N} in the form that S,/n — 0 in pr., then R$ ¢. 


PROOF. We need two lemmas, the first of which is also useful elsewhere. 


Lemma 1. For any € > 0 and meWN we have 


ee) 


(8) d > PUISn| < me} < 2m $7 PESn| < €). 


n=0 n=0 


PROOF OF LEMMA 1. It is sufficient to prove that if the right member of (8) 
is finite, then so is the left member and (8) is true. Put 


T=(-e,€), J=[jeG4+1e), 


for a fixed 7 € N; and denote by , gy, the respective indicator functions. 
Denote also by a the first entrance time into J, as defined in (5) of Sec. 8.2 
with Z, replaced by S, and A by J. We have 


9) é »» osnh = I, Yeu, dP. 
n=) ask 
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The typical integral on the right side is by definition of a equal to 


i 14 > ans hars [ {14 se aides han 
{a=k} {a=k) 


n=k+1 n=k+1 


since {a = k} C {S; € J} and {S, € J} N{S, € J} c {S, — 8, € TD}. Now {a = 
k} and S,, — S, are independent, hence the last-written integral is equal to 


ma=n {1+ f S> a(Sn - spar =70=8)9 e150}, 


n=k+1 n=0 


since ~; (So) = 1. Summing over k and observing that g;(0) = 1 only if 7 = 0, 
in which case J C J and the inequality below is trivial, we obtain for each j: 


{dats| = {aso} : 


n=0 n=0 


Now if we write J; for J and sum over j from —m to m — 1, the inequality 
(8) ensues in disguised form. 


This lemma is often demonstrated by a geometrical type of argument. 
We have written out the preceding proof in some detail as an example of the 
maxim: whatever can be shown by drawing pictures can also be set down in 
symbols! 


Lemma 2. Let the positive numbers {u,,(m)}, where n € N and m is a real 
number >1, satisfy the following conditions: 

(i) V,!Un(m) is Increasing in m and tends to 1 as m > ov; 

(ii) Jo > 0: rg Un (m) < cm yg Un (1) for all m > 1 


(iii) VS > O: limy 00 Un (bn) = 1. 


Then we have 


(10) yy =i. 
=0 


Remark. If (41) is true for all integer m > 1, then it is true for all real 
m > 1, with c doubled. 


PROOF OF LEMMA 2. Suppose not; then for every A > 0: 


[v.<) [Am] 


00 > Sa > = union) > — > unm) 


n=0 n=0 n=0 
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[Am] 


1 n 
>— —}. 
~ ae ee 


Letting m —> oo and applying (iii) with 6 = A7!, we obtain 


“ A 
Luo; 
n=0 e 


since A is arbitrary, this is a contradiction, which proves the lemma 


To return to the proof of Theorem 8.3.3, we apply Lemma 2 with 
Un(m) = PX{|Sn| < m}. 


Then condition (i) is obviously satisfied and condition (ii) with c = 2 follows 
from Lemma 1. The hypothesis that S,/n — 0 in pr. may be written as 


<5} +1 


for every 5 > 0 asin — 08, hence condition (iii) is also satisfied. Thus Lemma 2 
yields 


n 


un, (dn) = a1 


[o.@) 
S>AtISn| < 1} = +00. 
n=0 


Applying this to the “magnified” random walk with each X, replaced by X, /€, 
which does not disturb the hypothesis of the theorem, we obtain (6) for every 
€ > 0, and so the theorem is proved. 

In practice, the following criterion is more expedient (Chung and Fuchs, 
1951). 


Theorem 8.3.4. Suppose that at least one of é'(X +) and &(X7—) is finite. The 
9 + o if and only if &(X) = 0; otherwise case (ii) or (iii) of Theorem 8.2.5 
happens according as &(X) < 0 or > 0. 


proor. If —co < &(X) < 0 or 0 < &(X) < +0, then by the strong law 
of large numbers (as amended by Exercise 1 of Sec. 5.4), we have 


Sn 
——- &(X)ae., 
n 


so that either (ii) or (iii) happens as asserted. If &(X) = 0, then the same law 
or its weaker form Theorem 5.2.2 applies; hence Theorem 8.3.3 yields the 
conclusion. 
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DEFINITION OF RECURRENT RANDOM WALK. A random walk will be called 
recurrent iff RN # @; it is degenerate iff R = {0}; and it is of the lattice type 
iff % is generated by a nonzero element. 


The most exact kind of recurrence happens when the X,,’s have a common 
distribution which is concentrated on the integers, such that every integer is 
a possible value (for some S,,), and which has mean zero. In this case for 
each integer c we have Y{S,, = c i.o.} = 1. For the symmetrical Bernoullian 
random walk this was first proved by Pélya in 1921. 

We shall give another proof of the recurrence part of Theorem 8.3.4, 
namely that ¢(X) = 0 is a sufficient condition for the random walk to be recur- 
rent as just defined. This method is applicable in cases not covered by the two 
preceding theorems (see Exercises 6—9 below), and the analytic machinery 
of ch.f."s which it employs opens the way to the considerations in the next 
section. 

The starting point is the integrated form of the inversion formula in 
Exercise 3 of Sec. 6.2. Talking x = 0, u =e, and F to be the df. of S,, we 
have 

— coset 


aie Wisi2ds= = [ Ase <wau= = |" aoa 


Thus the series in (6) may be bounded below by summing the last expres- 
sion in (11). The latter does not sum well as it stands, and it is natural to resort 
to a summability method. The Abelian method suits it well and leads to, for 
O<re<l: 


ie A of 1 coset 1 
(12) yr P(|Sn| < €) > -/{ $= Ra 


where R and J later denote the real and imaginary parts or a complex quan- 
tity. Since 
1 1 — 
R-————_ > —_——" >0 
ber PO). rg | 


and (1 —coser)/t* > Ce? for |et} < 1 and some constant C, it follows that 
for n < 1\e the right member of (12) is not less than 


(14) [Ratt 
x J. 1—rf@® 


Now the existence of &(|X|) implies by Theorem 6.4.2 that 1 — f(t) = o(t) 
as t + 0. Hence for any given 6 > 0 we may choose the n above so that 


lh-rf@P? <Q —-r4rf1—-Rf@)? + @If@My 


(13) 
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< 2(1 —r)? + 2(rér)? + (rit)? = 201 — r)? + 3r28???. 


The integral in (14) is then not less than 


" (l—r)dt 1 f7d-n" ds 
2 722 42 3 262° 
=n 2(1 — r)* + 3r7d¢t 3 —yn(l—r)7! 1+ ds 
As r ¢ 1, the right member above tends to 2\36; since 6 is arbitrary, we have 
proved that the right member of (12) tends to +oo as r ¢ 1. Since the series 


in (6) dominates that on the left member of (12) for every r, it follows that 
(6) is true and so 0 € & by Theorem 8.3.2. 


EXERCISES 
f is the ch-f. of w. 


1. Generalize Theorems 8.3.1 and 8.3.2 to &%. (For d > 3 the general- 
ization is illusory; see Exercise 12 below.) 
*2. If a random walk in # is recurrent, then every possible value is a 
recurrent value. 
3. Prove the Remark after Theorem 8.3.2. 


*4, Assume that 7{X, = 0} < 1. Prove that x is a recurrent value of the 
random walk if and only if 


fo) 
> Al\Sn — x| < €} = 00 for every € > 0. 


n=1 


5. For a recurrent random walk that is neither degenerate nor of the 
lattice type, the countable set of points {$,(@), m € N} is everywhere dense in 
&' for a.e.w. Hence prove the following result in Diophantine approximation: 
if y is irrational, then given any real x and € > O there exist integers m and n 
such that |my +n —x| <e. 

*6. If there exists a 5 > O such that (the integral below being real-valued) 


— [> dt 
im | —“— =~, 
rt] Jos 1— rf (t) 


then the random walk is recurrent. 
*7, If there exists a 6 > 0 such that 


[ dt x 
su ins ; 
onre —d 1 — rf (t) 
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then the random walk is not recurrent. [HINT: Use Exercise 3 of Sec. 6.2 to 
show that there exists a constant C(€) such that 


— cose7!x Cte) 


l/e u 
P(USn| <€) ce | Ln (dx) < —— au | fa)" dt, 
Ai 2 0 —u 


x2 


where j, is the distribution of S,, and use (13). [Exercises 6 and 7 give, 
respectively, a necessary and a sufficient condition for recurrence. If w is of 
the integer lattice type with span one, then 


[| —eer~ 
-r1— f(t) 


is such a condition according to Kesten and Spitzer.] 

*§. Prove that the random walk with f(t) = e7~"! (Cauchy distribution) is 
recurrent. 

*9, Prove that the random walk with f(t) = e7'", 0 <a <1, (Stable 
law) is not recurrent, but (iv) of Theorem 8.2.5 holds. 


10. Generalize Exercises 6 and 7 above to #7. 


*11. Prove that in % if the common distribution of the random vector 
(X, Y) has mean zero and finite second moment, namely: 


&(X)=0, &€(Y)=0, O0< &(X74+Y*) <a, 


then the random walk is recurrent. This implies by Exercise 5 above that 
almost every Brownian motion path is everywhere dense in 2%”. [Hint: Use 
the generalization of Exercise 6 and show that 


1 c 


aL 
1-f(t,m) 7 A4+¢6 


tor sufficiently small |t,| + |f2|. One can also make a direct estimate: 


f 


P\Sp| <€) > ~ 


*12. Prove that no truly 3-dimensional random walk, namely one whose 
common distribution does not have its support in a plane, ts recurrent. [HINT: 
There exists A > O such that 


JL] (Se) 
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is a strictly positive quadratic form Q in (fy, fo, f3). If 


3 
So ltil< AT, 
i=] 


then 
R{l — fi, f, 3)} = CQ, ha, f3).] 


13.. Generalize Lemma 1 in the proof of Theorem 8.3.3 to A. For d = 2 
the constant 2m in the right member of (8) is to be replaced by 4mm’, and 
“Sn <€° means S, is in the open square with center at the origin and side 
length 2e. 

14. Extend Lemma 2 in the proof of Theorem 8.3.3 as follows. Keep 
condition (i) but replace (ii) and (iii) by 


Gi!) STF 9 Un (m) < cm YY 9 Un (L); 
There exists d > 0 such that for every b > 1 and m > m(b): 


ess dm* 2 2 
(iii’) u,(m) > —— for m* <n < (bm) 
n 


Then (10) is true.* 

15. Generalize Theorem 8.3.3 to &* as follows. If the central limit 
theorem applies in the form that S, |./n converges in dist. to the unit 
normal, then the random walk is recurrent. [HmNT: Use Exercises 13 and 14 and 
Exercise 4 of § 4.3. This is sharper than Exercise 11. No proof of Exercise 12 
using a similar method is known.] 

16. Suppose &(X) = 0, 0 < &(X*) < 00, and yp is of the integer lattice 
type, then 

P{S,2 =0i.0.} = 1. 


17. The basic argument in the proof of Theorem 8.3.2 was the “last time 
in (—e, €)”. A harder but instructive argument using the “first time” may be 
given as follows. 


ff" =P\|S;\ > form < j<n—1;|Sal<e€}. gnlE)=Al{ISul < €}. 
Show that for 1 <m <M: 
M M M 
Sane) = >) FOS | an (2€). 
n=m n=m n=0 


* This form of condition (iii’) is due to Hsu Pei; see also Chung and Lindvall, Proc. Amer. Math. 
Soc. Vol. 78 (1980), p. 285. 
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It follows by a form of Lemma | in this section that 


’ 


Ale 


x 

lim Uns 

m>x » fr =. 
A=>mMn 


now use Theorem 8.1.4. 
18. For an arbitrary random walk, if P{¥n ¢ N: S, > 0} > 0, then 


S>P{Sn <0, Sn41 > 0} < 00, 


n 


Hence if in addition P{Vn € N: S, <0} > 0, then 


dL |PSn > 0} — PlSn4i > 0} < 00; 


and consequently 


(—1)” 
S- ——P Sp > 0} < oo. 


n 


[HINT: For the first series, consider the last time that S,, < 0; for the third series, 
apply Du Bois—Reymond’s test. Cf. Theorem 8.4.4 below; this exercise will 
be completed in Exercise 15 of Sec. 8.5.] 


8.4 Fine structure 


In this section we embark on a probing in depth of the r.v. a defined in (11) 
of Sec. §.2 and some related r.v.’s. 

The r.v. a@ being optional, the key to its introduction is to break up the 
time sequence into a pre-a and a post-a@ era, as already anticipated in the 
terminology employed with a general optional r.v. We do this with a sort of 
characteristic functional of the process which has already made its appearance 
in the last section: 


(1) é ee 


n=0 


= n n 1 
Sr 70) aa TIE, 


n=0 


where 0 < r < 1, r is real, and f is the ch.f. of X. Applying the principle just 
enunciated, we break this up into two parts: 


a-—] fore) 
é ° wee eae y reo 


n=O n=a 
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with the understanding that on the set {a = oo}, the first sum above is }7>~o 
while the second is empty and hence equal to zero. Now the second part may 
be written as 


oo roe) 
(2) a 3 roth eltSa+n = 2 r@ etSa YS r” eit(Satn = 
n=0 n=0 

It follows from (7) of Sec. 8.2 that 
sath = Sa = Snot 


has the same distribution as S,, and by Theorem 8.2.2 that for each n it is 
independent of S,. [Note that the same fact has been used more than once 
before, but for a constant a.] Hence the right member of (2) is equal to 


[o.e) 


a itSy n itSy a eltSa 1 
E{rve ie Sore \. Ul emer resy 


n=0 


where r%e'!S« is taken to be 0 for a = oo. Substituting this into (1), we obtain 


1 a eitSa _ = n _itS;, 
(3) fo n= edn e \. 


We have 


ioe) 


(4) 1 — &{r%ele} =1- Sor" | a dP; 
n=1 {a=n} 


and 


1 oO 
(5) 3 n eltSn ‘| es Snel dP 


n= 


“sy i, el dP 
n=) {a 


>n} 
by an interchange of summation. Let us record the two power series appearing 
in (4) and (5) as 


P(r, t) =1— &{rvel} = S> r” pn(t); 


n=0 


al foe) 
Or, t)= & >» 7 = So r"an(t), 


n=0 n=0 
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where 


Pot) =1, pA) = -[ 


{a 


ets: dP = — / e'* U, (dx); 
=n} Rl 
HOLTEOS / et ap = [ e®V, (dx). 

{a>n)} Rl 


Now U,(-) = A{a =n;S, € -} is a finite measure with support in (0, oo), 
while V,,(-) = Alfa > n;S, € -} is a finite measure with support in (—oo, 0]. 
Thus each p, is the Fourier transform of a finite measure in (0, oc), while 
each q, is the Fourier transform of a finite measure in (—o«, 0]. Returning to 
(3), we may now write it as 


(6) P(r, t) = Q(r, 2). 


os 
1—rf(t) 


The next step is to observe the familiar Taylor series: 


1 xn 
tree { eh Pol eae 
Thus we have 


1 a a . 
(7) eri = exp - so) 


=a itS 
= — "dP Sn dP 
i ? " es : H a 4 | 


n=] 
oe, fa os ee t), 
where 


fart) =o{-35 | 


n=1 (0,00) 


Ce r” 
f(r, t) = exp c ve | ew, ws| . 
n=] (—00,0] 


and u,(-) = A{S, €-} is the distribution of S,. Since the convolution of 
two measures both with support in (Q, oo) has support in (0, oo), and the 
convolution of two measures both with support in (—oo, 0] has support in 
(—oo, 0], it follows by expansion of the exponential functions above and 


cuqtts| , 


8.4 FINE STRUCTURE | 291 


rearrangements of the resulting double series that 


fenn=14t Sore, f-)=14+ Sor, 


n=] n=] 


where each g, is the Fourier transform of a measure in (0, 00), while each 
y, is the Fourier transform of a measure in (—oo, 0]. Substituting (7) into (6) 
and multiplying through by f+(r, t), we obtain 


(8) P(r, ti f_(, th= QC, Offa, t). 
The next theorem below supplies the basic analytic technique for this devel- 


opment, known as the Wiener—Hopf technique. 


Theorem 8.4.1. Let 


Pr,t)= Sor prt), Or.) = do r"an@), 


n=0 n=0 
Per, =S orp), Arn =SorgGo, 
=0 =0 


where po(t) = go(t) = po(t) = 9(t) = 1; and for n > 1, pn and pj, as func- 
tions of t are Fourier transforms of measures with support in (0, 0); g, and 
g* as functions of t are Fourier transforms of measures in (—oo, 0]. Suppose 
that for some rp > O the four power series converge for r in (0, ro) and all 
real t, and the identity 


(9) P(r, HQ" (7, t) = P(r, NO, 1) 
holds there. Then 
P=P*,Q=Q. 
The theorem is also true if (0, 00) and (—oo, 0] are replaced by [0, oo) and 
(—oo, 0), respectively. 


PROOF. It follows from (9) and the identity theorem for power series that 
for every n > 0: 


(10) S~ pe(gr_4(O = > Pi Oan-x(0). 
k=0 k=0 


Then for n = 1 equation (10) reduces to the following: 


pit) — prt) = q(t) — g7@). 
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By hypothesis, the left member above is the Fourier transform of a finite signed 
measure v, with support in (0, oo), while the right member is the Fourier 
transform of a finite signed measure with support in (—oo, 0]. It follows from 
the uniqueness theorem for such transforms (Exercise 13 of Sec. 6.2) that 
we must have v; = v2, and so both must be identically zero since they have 
disjoint supports. Thus p; = pj and q, = gj. To proceed by induction on n, 
suppose that we have proved that pj = pj and q; =q; forO<j<n-1. 
Then it follows from (10) that 


Pna(t)+q,(t) = p(t) + an(t). 


Exactly the same argument as before yields that p, = p*, and q, = g;,. Hence 
the induction is complete and the first assertion of the theorem is proved; the 
second is proved in the same way. 


Applying the preceding theorem to (8), we obtain the next theorem in 
the case A = (0, 0); the rest is proved in exactly the same way. 


Theorem 8.4.2. If a = az is the first entrance time into A, where A is one 
of the four sets: (0, co), [0, 00), (—oo, 0), (—ox, 0], then we have 


. = n . 
(11) a E{r%eltSa} = exp {- > —f ao) ; 
n=l nN J{S,€A} 
a-l 
(12) {Sr ee : = exp {+305 / eis anh, 
{S,€A°} 


n=0 


From this result we shall deduce certain analytic expressions involving 
the r.v. a. Before we do that, let us list a number of classical Abelian and 
Tauberian theorems below for ready reference. 


(A) If cn > 0 and S77..9 cnr” converges for 0 <r < 1, then 


(*) lim Scar" =e 


finite or infinite. 

(B) If c, are complex numbers and par c,r” converges for 0 <r <1, 
then (*) is true. 

(C) If c, are complex numbers such that c, = o(n~!) [or just O(n7!)] 
as n —> 00, and the limit in the left member of (*) exists and is finite, then 
(*) is true. 
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(D) If ce > 0, ocr" converges for 0 <r <1 and diverges for 


r=1,i=1,2; and 
n n 
Sra KP 
k=0 k=0 


[or more particularly c“) ~ Kc®)] as n + oo, where 0 < K < +00, then 


foe) foe) 
\ cue ~K S> cy 
n=0 n=0 


asr fl. 
(E) If c, > 0 and 


1 


oO 
) Cyr” ~ i 
n=0 me 


as r t 1, then 
n-] 
) Chk ~ Nn. 
k=0 


Observe that (C) is a partial converse of (B), and (E) is a partial converse 
of (D). There is also an analogue of (D), which is actually a consequence of 
(B): if c, are complex numbers converging to a finite limit c, then as r ¢ 1, 


= Cc 

S Car” ~ 1 . 
—r 

n=0 


Proposition (A) is trivial. Proposition (B) is Abel’s theorem, and proposi- 
tion (C) is Tauber’s theorem in the “little o” version and Littlewood’s theorem 
in the “big O” version; only the former will be needed below. Proposition (D) 
is an Abelian theorem, and proposition (E) a Tauberian theorem, the latter 
being sometimes referred to as that of Hardy-Littlewood-Karamata. All four 
can be found in the admirable book by Titchmarsh, The theory of functions 
(2nd ed., Oxford University Press, Inc., New York, 1939, pp. 9-10, 224 ff.). 


Theorem 8.4.3. The generating function of a in Theorem 8.4.2 is given by 


(13) E{r%}) = 1 - exp | - Eas, cai} 


n=1 


=1—(1—r)exp 1 “AS, ie a} ; 


n=1 


294 | RANDOM WALK 


We have 


oO 
1 
(14) Pia < co} = 1 if and only if y —~P{S, € A] = OW; 
n 


n=] 


in which case 


(15) &{a} = exp 1S “AS, é a} 


n=] 


PROOF. Setting ¢ = 0 in (11), we obtain the first equation in (13), from 
which the second follows at once through 


1 er ar" ie 
—_——_ = — = —P n nate n . . 
a o0 {oo | of) [s Ste aD € A‘] 


Since 


n=1 


Co CO 
lim E{r%} = Eee =njr" =) Pla =n} = Pia < 00} 


by proposition (A), the middle term in (13) tends to a finite limit, hence also 
the power series in r there (why?). By proposition (A), the said limit may be 
obtained by setting r = 1 in the series. This establishes (14). Finally, setting 
t = 0 in (12), we obtain 


a—1 CO on 
(16) Sr =o {+ Fas cat. 


n=0 n=1 


Rewriting the left member in (16) as in (5) and letting r ¢ 1, we obtain 


CO [o@) 
(17) lim Dr" Pla >n}= 5° Pla > n) = &{a} < 00 


n=0 n=0 


by proposition (A). The right member of (16) tends to the right member of 
(15) by the same token, proving (15). 
When is &{a} in (15) finite? This is answered by the next theorem.* 


Theorem 8.4.4. Suppose that X #0 and at least one of (Xt) and &(X7) 
is finite; then 

(18) é(X)>0=> E{A0,00)} < WwW; 

(19) é(X)<0>5 &{Q10,00)} =o. 


*It can be shown that S, > +00 a.e. if and only if “{0(0,00)} < 00; see A. J. Lemoine, Annals 
of Probability 2(1974). 
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prooF. If &(X) > 0, then P{S,, — +00} = 1 by the strong law of large 
numbers. Hence /{limp.. S$, = —0o} = 0, and this implies by the dual 
of Theorem 8.2.4 that P{a(_o,9) < 00} < 1. Let us sharpen this slightly to 
P{(~00,0] < 00} < 1. To see this, write a’ = @_oo,9) and consider Sg as 
in the proof of Theorem 8.2.4. Clearly Sg <0, and so if Fla <o}=1, 
one would have 7{S, < 0 1.0.} = 1, which is impossible. Now apply (14) to 
@(—o0,0) and (15) to a@o,o0) to infer 


wo 

: 1 

E{O(@,00)} = eXp 1S ~FISn S a} < 00, 
n=1 

proving (18). Next if &(X) = 0, then P{ay_oo,9) < ©} = 1 by Theorem 8.3.4. 

Hence, applying (14) to a—o0,9) and (15) to a9,.0), we infer 


oo 


he 
Ea[0,00)} = XP 1 -PISn < a} = 00. 


n=1 


Finally, if &(X) < 0, then P{ayo,.0) = oo} > O by an argument dual to that 
given above for a(—o0,0}, and so &{a/0,o0)} = 00 trivially. . 


Incidentally we have shown that the two f.v.’S Q@(0,o0) and Q9,o0) have 
both finite or both infinite expectations. Comparing this remark with (15), we 
derive an analytic by-product as follows. 


Corollary. We have 


oe) 
1 
(20) S- =PIS; = 0] < 00. 
nh 


n=] 


This can also be shown, purely analytically, by means of Exercise 25 of 
Sec. 6.4. 


The astonishing part of Theorem 8.4.4 is the case when the random walk 
is recurrent, which is the case if é(X) =0 by Theorem 8.3.4. Then the set 
[0, oo), which is more than half of the whole range, is revisited an infinite 
number of times. Nevertheless (19) says that the expected time for even one 
visit is infinite! This phenomenon becomes more paradoxical if one reflects 
that the same is true for the other half (—oo, 0], and yet in a single step one 
of the two halves will certainly be visited. Thus we have: 


A(—0c,0] /\ Q10,00) = 1, & {O(—c0,0}} — & {L10,00)} = ©. 


Another curious by-product concerning the strong law of large numbers 
is obtained by combining Theorem 8.4.3 with Theorem 8.2.4. 
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Theorem 8.4.5. S,/n — m ae. for a finite constant m if and only if for 
every € > 0 we have 


(21) S- 74 
n=] 


PROOF. Without loss of generality we may suppose m = 0. We know from 
Theorem 5.4.2 that S,/n — 0 ae. if and only if &(|X|) < 06 and &(X) =0. 
If this is so, consider the stationary independent process {X},,n € N}, where 
X}, =X, —€,€ > 0; and let S, and a’ = a.) be the corresponding r.v.’s 
for this modified process. Since &(X’) = —e, it follows from the strong law 
of large numbers that S), +> —oo a.e., and consequently by Theorem 8.2.4 we 
have P{a’ < co} < 1. Hence we have by (14) applied to a’: 


Sn 
——m|>€> <o. 
n 


(22) S> AS, —ne> 0] <0. 


n=1 


By considering X, +¢€ instead of X, —€, we obtain a similar result with 
“S$, —ne > 0” in (22) replaced by “S, +ne <0”. Combining the two, we 
obtain (21) when m = 0. 

Conversely, if (21) is true with m = 0, then the argument above yields 
Pia! < oo} < 1, and so by Theorem 8.2.4, P{limp—+oo Si, = +00} =0. A 
fortiori we have 


Ve > 0: A{S,, > ne i.o.} = A{S, > 2neé i.0.} = 0. 


Similarly we obtain Ve > 0: A{S, < —2ne i.o.} = 0, and the last two rela- 
tions together mean exactly S,,/n — 0 a.e. (cf. Theorem 4.2.2). 


Having investigated the stopping time a, we proceed to investigate the 
stopping place Sy, where a < oo. The crucial case will be handled first. 


Theorem 8.4.6. If £(X) =0 and 0 < &(X”) =o? < ov, then 


(23) {Sq} = a exp »» - E ~ P(Sp € 4)| < 0. 
n=1 


PROOF. Observe that £(X) = 0 implies each of the four r.v.’s @ is finite 
a.e. by Theorem 8.3.4. We now switch from Fourier transform to Laplace 
transform in (11), and suppose for the sake of definiteness that A = (0, 00). 
It is readily verified that Theorem 6.6.5 is applicable, which yields 


OO un 
(24) 1— E{r%e—e} =exp< — Ss; es i en dP 
“OM Ssn>0) 
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for0 <r<1,0<,A < oo. Letting r ¢ 1 in (24), we obtain an expression for 
the Laplace transform of Sy, but we must go further by differentiating (24) 
with respect to A to obtain 


(25) &{r%e*"S,} 


oO yn and yr” 
= S- — / S,e*" dP - exp < — S- — / esd Ps | 
n {s,>0} n=l n {51 >0} 


n=1 


the justification for termwise differentiation being easy, since €{|S,|} <n &{|X]}. 
If we now set A = 0 in (25), the result is 
oO r? oe) r" 
26 Mr? Sq} = — &{St - —P[S, >O0]>. 
(26) E{r*Sa} = y~ cistiew | AZ ~} 


n 


n=] n=] 


By Exercise 2 of Sec. 6.4, we have asn > &, 


(8) 
n 27mn 
so that the coefficients in the first power series in (26) are asymptotically equal 
to o//2 times those of 


~1/2 1 (2n\ » 
(1-7) =) mn alae 


n= 


since 


1 an 1 
227 \n Jinn 
It follows from proposition (D) above that 


CO un 


oo n 
Pests. 2a _piea Qo 
Vets) Ful r) Sette} 


n=l n=] 


Substituting into (26), and observing that as r f 1, the left member of (26) 
tends to é{Sy} < oo by the monotone convergence theorem, we obtain 


[e.@) 
o r” [J 
AS} = li — |--VP(S, > 0 ; 
(27) & {Sq} A amex | 7 E (Sy > \ 
It remains to prove that the limit above is finite, for then the limit of the 
power series is also finite (why? it is precisely here that the Laplace transform 
saves the day for us), and since the coefficients are o(1/n) by the central 
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limit theorem, and certainly O(1/n) in any event, proposition (C) above will 
identify it as the right member of (23) with A = (0, 00). 

Now by analogy with (27), replacing (0,00) by (—oo, 0] and writing 
Q(—o0,0] aS B, we have 


OO Ty 
(28) é{Sp} = a lim exp 2 — E ~ PS, < 0) 
n=] 


Clearly the product of the two exponentials in (27) and (28) is just exp 0 = 1, 
hence if the limit in (27) were +oo, that in (28) would have to be 0. But 
since ¢(X) =0 and ¢(X”) > 0, we have A(X <0) > 0, which implies at 
once (Sg < 0) > 0 and consequently {Sg} < 0. This contradiction proves 
that the limits in (27) and (28) must both be finite, and the theorem is proved. 


Theorem 8.4.7. Suppose that X # 0 and at least one of &(X*) and &(X7~) 
is finite; and let a = aE), B = A—00,0)- 


(i) If €(X) > 0 but may be +00, then &(S,) = &(a)&(X). 

(ii) If &(X) =0, then &(S,) and &(Sg) are both finite if and only if 
&(X*) < ©. 

PROOF. The assertion (i) is a consequence of (18) and Wald’s equation 
(Theorem 5.5.3 and Exercise 8 of Sec. 5.5). The “if’ part of assertion (ii) 
has been proved in the preceding theorem; indeed we have even “evaluated” 
€(Sq). To prove the “only if’ part, we apply (11) to both @ and £ in the 
preceding proof and multiply the results together to obtain the remarkable 
equation: 


(29) [1 — &{r®e"S\ 1 — é{rPeiSy] J=00 =} 3S "fo" \ai-nfe 


Setting r = 1, we obtain for t 4 0: 
eae as eee é{ele} 1 — &{eltSe} 
po Sai +it 
Letting r | 0, the right member above tends to é{S,}@{—S,} by Theorem 6.4.2. 


Hence the left member has a real and finite limit. This implies ¢ (X27) < 0 by 
Exercise 1 of Sec. 6.4. 


8.5 Continuation 


Our next study concerns the r.v.’s M, and M defined in (12) and (13) of 
Sec. 8.2. It is convenient to introduce a new r.v. L,, which is the first time 
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(necessarily belonging to N°) that M,, is attained: 

(1) Vn € N°: Ly(w) = min{k € N° : S,(w) = M,,(@)}; 
note that Lg = 0. We shall also use the abbreviation 

(2) A=A0,0), B= A(~o0,0] 


as in Theorems 8.4.6 and 8.4.7. 
For each n consider the special permutation below: 


1, 2, wee a 
@) m= () n-l,..., 1) 


which amounts to a “reversal” of the ordered set of indices in N,,. Recalling 
that B° p, is the r.v. whose value at w is B(0,@), and S; ° py = Sy — Sn—x 
for 1 <k <n, we have 


{B° pn > n} =( {Sk ° Pn > 0} 


k=1 
n—-1 
=(\s, > Spe) = (Sn > Se} = (Ln = 1}. 
k=1 k=0 


It follows from (8) of Sec. 8.1 that 


(4) / e" dP = enn) dP = / en dP. 
{B>n} {B°o, >n} {Lr=n} 


Applying (5) and (12) of Sec. 8.4 to 6 and substituting (4), we obtain 


ive) CO Ln 
(5) dor" / e'"" dP = exp {+ LF i a ia| 
1 7 J{S,>0} 
n=0 n=l 


{L,=n} 
applying (5) and (12) of Sec. 8.4 to @ and substituting the obvious relation 
{a > n} = {L, = 0}, we obtain 


(©) ye [, =0} end? = me {+2 UT a 0} u ai 


n=] 


We are ready for the main results for M,, and M, known as Spitzer’s identity: 


Theorem 8.5.1. We have for 0 <r< ls: 


=~ ng ir _ = r itSy 
(7) Sor eit") =o {0G ¥ 


n=0 n=! 


300 | RANDOM WALK 


M is finite a.e. if and only if 


(8) S- “AS, > 0} < co, 


n=] 
in which case we have 
. “1 
(9) &{e'™) = exp > —[&(e* ) — u} 
n=1 n 

PROOF. Observe the basic equation that follows at once from the meaning 
of Ly: 
(10) {Ly =k} = {Ly =k} O {Ln_~ 0 F = 0}, 
where t* is the kth iterate of the shift. Since the two events on the right side 


of (10) are independent, we obtain for each n € N°, and real t and u: 


n 
(11) E{eltM ei(Sn—Mn)) _ > | eitSk pit(Sn-Sk) ap 
kao 2 {bn=h} 


n 


_ / otk IP eiM(Sn°t) gop 
{Ly=k} {Ln—4°t*=0} 


k=0 
n 
= / eltSk dP elMSn-k dP 
cao  (Le=k) {Ln—-4=0} 
It follows that 
fo 3) 
(12) S- r” E{eltMn eit(Sn—Mn)) 
n=0 
CO oO 
= S> r” / ells dP. S> r” / eltSn dP. 
n=0 {L,=n)} =0 {Z,=0) 


Setting uv = 0 in (12) and using (5) as it stands and (6) with t = 0, we obtain 


fore) CO un 
S- r” &{elMn) — exp »» r / el!Sn dP + aa , 
n {S,,>0} {Sn <0} 


n=0 n=] 


which reduces to (7). 

Next, by Theorem 8.2.4, M < oo ae. if and only if A{a < oo} < 1 or 
equivalently by (14) of Sec. 8.4 if and only if (8) holds. In this case the 
convergence theorem for ch.f.’s asserts that 


E{e"™) = lim &{e™"}, 
noo 
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and consequently 
Ce 


&eM) = lim(l — 7) Sor ete} 


n=0 


i r” = r” Ot 
= |i _ —  £eitSn 
ines ys . exp > : (e°"" ) 


n=l n=1 


oO Fn 
r ot 
— jj —Ié its) __ 1 , 
meso |) —[é(e"**) } 
where the first equation is by proposition (B) in Sec. 8.4. Since 

S> =e") — 1] < $5 =P{Sq > 0] < 00 

n=l n=1 i 
by (8), the last-written limit above is equal to the right member of (9), by 
proposition (B). Theorem 8.5.1. is completely proved. 


By switching to Laplace transforms in (7) as done in the proof of 
Theorem 8.4.6 and using proposition (E) in Sec. 8.4, it is possible to give 
an “analytic” derivation of the second assertion of Theorem 8.5.1 without 
recourse to the “probabilistic” result in Theorem 8.2.4. This kind of mathemat- 
ical gambit should appeal to the curious as well as the obstinate; see Exercise 9 
below. Another interesting exercise is to establish the following result: 


“1 
(13) EMn) = >) GES) 
k=] 
by differentiating (7) with respect to t. But a neat little formula such as (13) 
deserves a simpler proof, so here it is. 


Proof of (13). Writing 


and dropping “d” in the integrals below: 


E(M ea St 
: ey 
X, +0\/(S, —X1) +/ 5 
eo ) ee ; 


ee k=2 k=1 


a XxX, +f 0 \V Sx — 1) 
= {S,>0} | V. 


n 


n-l 


{M,-1>0;S, <0} k=1 
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Call the last three integrals f,, f,, and f,. We have on grounds of symmetry 


jaa) 
== Sie: 
1 nN J{S,>0} 


Apply the cyclical permutation 


to f, to obtain 


Obviously, we have 


=) Mes = My-1. 
3 {M,-1>0;S, <0} {S,, <0} 


Gathering these, we obtain 


1 
eM) =~ | s+ [ Mn+ f Mn-1 
N J{S,>0} {S,>0} {S, <0} 


1 erot 
28nd + &Mn-1), 


II 


and (13) follows by recursion. 


Another interesting quantity in the development was the “number of 
strictly positive terms” in the random walk. We shall treat this by combinatorial 
methods as an antidote to the analytic skulduggery above. Let us define 


v,(@) = the number of k in N,, such that S;(w) > 0; 


v,(@) = the number of k in N,, such that S;,(w) < 0. 


For easy reference let us repeat two previous definitions together with two 
new ones below, for n € N°: 


M,,(w) = max Sj(w); Ly (w) = min{j € N, : S;(@) = M,(o)}; 
<j<n 
M,,(@) = min Sj(@); L,(@) = max{j € N° : S;(@) = Mi (@)}. 
<jsn 
Since the reversal given in (3) is a 1-to-1 measure preserving mapping 


that leaves S, unchanged, it is clear that for each A € AQ, the measures on 
Z! below are equal: 


(14) PIAS, € }=Plpn Ai Sn € +}. 
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Lemma. For each n € N°, the two random vectors 
(Ln, Sn) and (n —L, Sn) 
have the same distribution. 


PROOF. This will follow from (14) if we show that 

(15) VkEN®: prl{L, =k} = {Li =n—k}. 
Now for p,w the first n + 1 partial sums S;(P,@), j € N@, are 

0, On, On + Oni, -- +5 On Hes + On—j4is +, On Hes +1, 
which is the same as 

Sn — Sn, Sn — Sn—-1, Sn — Sn—a, +++ 3 Sn — Sn—jy ++ Sn — Sos 
from which (15) follows by inspection. 
Theorem 8.5.2. For each n € N°, the random vectors 
(16) (Ln, Sn) and (vn, Sn) 
have the same distribution; and the random vectors 
(16’) (Li,Sn) and (v,, Sn) 


have the same distribution. 


PROOF. Forn = 0 there is nothing to prove; for n = 1, the assertion about 
(16) is trivially true since {L; = 0} = {S, < 0} = {v, =}; similarly for (16’). 
We shall prove the general case by simultaneous induction, supposing that both 
assertions have been proved when n is replaced by n — 1. For eachk EN o. 


and y € &!, let us put 


G(y) = PL y-1 = ky Sy-1 < y}, H(y) = P{Vn-1 = ky Sn) < y}. 


Then the induction hypothesis implies that G = H. Since X,, is independent 


of %,_, and so of the vector (Ly_-1, Sn-1), we have for each x € Ri: 


(17) An = Sn a= | F(x — y)dG(y) 


=| F(x — y)dH(y) 


= Py. = k3 Sy <x}, 
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where F is the common d.f. of each X,,. Now observe that on the set {S;, < 0} 
we have L, = L,_, by definition, hence if x < 0: 
(18) {w 2 Ly (w) =k; S,(@) <x} = {@: Lp-1(@) = k3 S,() < x}. 
On the other hand, on the set {S,, < 0} we have v,_; = v,, so that if x < 0, 
(19) {@: Un (@) =k; Sn(@) S x} = {w 2 Vp_-1(@) = k; Sn (@) X< x}. 
Combining (17) with (19), we obtain 
(20) WkEN? x <0: PA{L, =kiS, <x} = Ply, =—k:S, <x}. 
Next if k € N0, and x > 0, then by similar arguments: 
{o : L,(@) =n —k;S,(@) > x} = {@: Li_j(w) =n — ks Sn(@) > x}, 
{w: Vv. (@) =n —k;S,(@) > x} = {w: Vv }(@) =n —k;Sp(w) > x}. 
Using (16’) when n is replaced by n — 1, we obtain the analogue of (20): 
(21) VkEN),x>0: AL =n—k;S, >x} = Pl =n—k:S, > x). 


The left members in (21) and (22) below are equal by the lemma above, while 
the right members are trivially equal since v, + vi =n: 


(22) Wk ENO, x >0: PlLy =kiSy > x} = Plvy =ksSp > x}. 
Combining (20) and (22) for x = 0, we have 
(23) Vk E N°: ALL, =k} = Plv, =k}; 


subtracting (22) from (23), we obtain the equation in (20) for x > 0; hence it 
is true for every x, proving the assertion about (16). Similarly for (16’), and 
the induction is complete. 


As an immediate consequence of Theorem 8.5.2, the obvious relation 
(10) is translated into the by-no-means obvious relation (24) below: 


Theorem 8.5.3. We have for k € N°: 
(24) FAVS = k} = Privy = K}P {Vpn = O}. 


If the common distribution of each X,, is symmetric with no atom at zero, 
then 


eee 
(25) Wk EN°: Pv, =k} = (-1)" ( A) (, a | 
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PROOF. Let us denote the number on right side of (25), which is equal to 
i 2k 2n — 2k 
Q2n\ k n-k ]}?’ 


by a,(k). Then for each n EN, {a,(k), k € N°} is a well-known probability 
distribution. For n = 1 we have 


Ply, = 0) = Aly, = 1} = 5 =a, (0) = an(n), 


so that (25) holds trivially. Suppose now that it holds when n is replaced by 
n — 1; then for k € N,—; we have by (24): 


_1l _1 
Ply =k} = (-1) ( ) (-1)"" (, ) = a, (k). 


It follows that 


n—-1 
Plvy =O} + Pltn = 2} =1—} | Plm = kh 
k=1 
n—l 
= 1S) an(k) = an (0) + ann). 
k=1 


Under the hypotheses of the theorem, it is clear by considering the dual random 
walk that the two terms in the first member above are equal; since the two 
terms in the last member are obviously equal, they are all equal and the 
theorem is proved. 

Stirling’s formula and elementary calculus now lead to the famous “arcsin 
law”, first discovered by Paul Lévy (1939) for Brownian motion. 


Theorem 8.5.4. If the common distribution of a stationary independent 
process is symmetric, then we have 


Vn 2 1 f* du 
Vx e€ [0, 1] : lim ?{— <x} = - arc sin «/x = - | Julai 


This limit theorem also holds for an independent, not necessarily stationary 
process, in which each X, has mean 0 and variance 1 and such that the 
classical central limit theorem is applicable. This can be proved by the same 
method (invariance principle) as Theorem 7.3.3. 
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EXERCISES 


1. Derive (3) of Sec. 8.4 by considering 


oe) 


E{r% ela} = > r” i el!Sn dP =_ eltSn av . 
{a>n—1} {a>n} 


n=1 


*2. Under the conditions of Theorem 8.4.6, show that 


E{Sq} = — lim S,dPf. 


nO Sta>n} 


3. Find an expression for the Laplace transform of Sy. Is the corre- 
sponding formula for the Fourier transform valid? 
*4, Prove (13) of Sec. 8.5 by differentiating (7) there; justify the steps. 
5. Prove that 


= NG = = = ce 
>" PIM, = 0) = exp 2» — AIS, <a} 


n=0 n=] 


and deduce that 


1 
PIM = 0] = exp {-> -PIS, > O}}. 
n=l HM 
6. If M < © ae., then it has an infinitely divisible distribution. 
7. Prove that 


S > P{Ln = 0; Sy = 0} = exp 1 “AS, = at. 


n=0 n=] 
[HINT: One way to deduce this is to switch to Laplace transforms in (6) of 


Sec. 8.5 and let A > oo.] 
8. Prove that the left member of the equation in Exercise 7 is equal to 


oo —1 
f-Soate’=nis, =a) ; 
n=1 


where a’ = a9,o0); hence prove its convergence. 


*9. Prove that . 


S> 1 os, > 0] < co 
n 


n=1 
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implies 7?(M < oo) = 1 as follows. From the Laplace transform version of 
(7) of Sec. 8.5, show that for 4 > 0, 


oO 
lim(l—r) Sor" é{e 
im 2d ee) 
exists and is finite, say = (A). Now use proposition (E) in Sec. 8.4 and apply 
the convergence theorem for Laplace transforms (Theorem 6.6.3). 
*10. Define a sequence of r.v.’s {Yn,n € N°} as follows: 


Yo = 0, Yn41 = (Yn +Xn4i)’, neéN®, 


where {X,,n € N} is a stationary independent sequence. Prove that for each 
n, Y, and M,, have the same distribution. [This approach is useful in queuing 
theory.] 
11. If —co < &(X) < 0, then 
é(X)= &(-V_), where V= sup S;. 


1<j<0o 
[HinT: If V, = maxj<j<n Sj, then 
E(t) + &(e) — 1 = &EY) = EM) FO; 


let n —> 00, then t | 0. For the case &(X) = —oo, truncate. This result is due 
to S. Port.] 

*12. If Plae.o) < co} < 1, thenv, > v,L, — L, both limits being finite 
a.e. and having the generating functions: 
e(pP’) = &{rt las, 50 
é{r°} = &{r°} = exp S> [S, > QO] >. 


n=l 


[HINT: Consider limy,— 0 S09 P\vm = njr" and use (24) of Sec. 8.5.] 
13. If &(X) = 0, &(X?) = 07, 0 < o? < ©, then as n — 00 we have 


ef e* 


where 


[HINT: Consider 


io.) 
lim. —r)'/7 Sr" Ply, = 01 
im( — 7) » 
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as in the proof of Theorem 8.4.6, and use the following lemma: if Pn is a 
decreasing sequence of positive numbers such that 


n 
So Pe © 2n'?, 
k=l 


then p, ~ n~}/?,] 
*14. Prove Theorem 8.5.4. 
15. For an arbitrary random walk, we have 


(-1)", 
S> 7 P{S, > O} < oo. 


n 


[HINT: Half of the result is given in Exercise 18 of Sec. 8.3. For the remaining 
case, apply proposition (C) in the O-form to equation (5) of Sec. 8.5 with L, 
replaced by v, and t = 0. This result is due to D. L. Hanson and M. Katz.] 
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9 Conditioning. Markov 
property. Martingale 


9.1 Basic properties of conditional expectation 


If A is any set in ¥ with P(A)>0, we define Ax (-) on F as follows: 


PANE) 


(1) P(E) = PA) 


Clearly 7, is a p.m. on #; it is called the “conditional probability relative to 


A”. The integral with respect to this p.m. is called the “conditional expectation 
relative to A”: 


- = Y —_ _ Y 
(2) Ex(Y) = [ YwWPado) = Fa / ¥ (w)P(do). 


If P(A) = 0, we decree that 7, (E) = 0 for every E € #. This convention is 
expedient as in (3) and (4) below. 
Let now {A,, > 1} be a countable measurable partition of Q, namely: 


Q=(JA,, An €F, Am OAn = 2, if mn. 
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Then we have 


(3) KES) PROS PET AES 
n=] n=] 

(4) E=Y | Yodo) =P Pnén 
n=] An n=) 


provided that <(Y) is defined. We have already used such decompositions 
before, for example in the proof of Kolmogorov’s inequality (Theorem 5.3.1): 


n 


| Sr dP =S_P(Ax) En, (S2). 
A 


k=1 
Another example is in the proof of Wald’s equation (Theorem 5.5.3), where 


[oe] 


E(Sw) = SPAN = k) wan (Sy). 
k=] 


Thus the notion of conditioning gives rise to a decomposition when a given 
event or r.v. is considered on various parts of the sample space, on each of 
which some particular information may be obtained. 

Let us, however, reflect for a few moments on the even more elementary 
example below. From a pack of 52 playing cards one card is drawn and seen 
to be a spade. What is the probability that a second card drawn from the 
remaining deck will also be a spade? Since there are 51 cards left, among 
which are 12 spades, it is clear that the required probability is 12/51. But is 
this the conditional probability 7, (E) defined above, where A = “first card 
is a spade” and E = “second card is a spade”? According to the definition, 


13312 
PANE) =, 5051. 12 
P(A) 13 51° 
52 


where the denominator and numerator have been separately evaluated by 
elementary combinatorial formulas. Hence the answer to the question above is 
indeed ‘“‘yes”; but this verification would be futile if we did not have another 
way to evaluate 7, (EF) as first indicated. Indeed, conditional probability is 
often used to evaluate a joint probability by turning the formula (1) around as 


follows: 

PNOE) = P(A)AME) = =, = 

eee = BOF 5 
In general, it is used to reduce the calculation of a probability or expectation 
to a modified one which can be handled more easily. 
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Let & be the Borel field generated by a countable partition {A,}, for 
example that by a discrete r.v. X, where A, = {X = a,}. Given an integrable 
r.v. Y, we define the function &(Y) on Q by: 


(5) EY) =S> &, 1a, 0. 


n 


Thus ¢;(Y) is a discrete r.v. that assumes that value ¢,,(Y) on the set A, 
for each n. Now we can rewrite (4) as follows: 


EY) = 2 EAY) dP = is Eg (Y) dP. 


Furthermore, for any A € %, A is the union of a subcollection of the A,,’s 
(see Exercise 9 of Sec. 2.1), and the same manipulation yields 


(6) VA ev: | yap = [ Eg (¥) dP. 
A A 


In particular, this shows that é¢(Y) is integrable. Formula (6) equates two 
integrals over the same set with an essential difference: while the integrand 
Y on the left belongs to ¥, the integrand &¢(Y) on the right belongs to the 
subfield &. [The fact that &¢(Y) is discrete is incidental to the nature of &.] 
It holds for every A in the subfield #, but not necessarily for a set in ¥\F%. 
Now suppose that there are two functions g; and ¢, both belonging to J, 
such that 


vaes: | var= | oa9. Labs: 
A A 
Let A = {w: ¢}(@) > g2(@)}, then A € & and so 


[o ~ g) dP =0. 
A 


Hence P(A) = 0; interchanging g; and g2 above we conclude that gy; = ¢2 
a.e. We have therefore proved that the &(Y) in (6) is unique up to an equiv- 
alence. Let us agree to use ¢¢(Y) or &(Y | #) to denote the corresponding 
equivalence class, and let us call any particular member of the class a “version” 
of the conditional expectation. 

The results above are valid for an arbitrary Borel subfield & and will be 
stated in the theorem below. 


Theorem 9.1.1. If é((Y|) < oo and & is a Borel subfield of Y, then there 
exists a unique equivalence class of integrable r.v.’s &(Y | #) belonging to # 
such that (6) holds. 
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PROOF. Consider the set function v on 4: 
VA EG: WA) = ‘) Y d?. 
A 


It is finite-valued and countably additive, hence a “signed measure” on &. 
If /(A) = 0. then v(A) = 0; hence it is absolutely continuous with respect 
to P?:v < Y. The theorem then follows from the Radon—Nikodym theorem 
(see, e.g., Royden [5] or Halmos [4]), the resulting “derivative” dv/df being 
what we have denoted by ¢(Y | %). 


Having established the existence and uniqueness, we may repeat the defi- 
nition as follows. 


DEFINITION OF CONDITIONAL EXPECTATION. Given an integrable r.v. Y and a 
Borel subfield %, the conditional expectation €(¥ | ¥) of Y relative to & is 
any one of the equivalence class of r.v.’s on Q satisfying the two properties: 


(a) it belongs to J; 
(b) it has the same integral as Y over any set in %. 


We shall refer to (b), or equivalently formula (6) above, as the “defining 
relation” of the conditional expectation. In practice as well as in theory, the 
identification of conditional expectations or relations between them is estab- 
lished by verifying the two properties listed above. When Y = 1,, where 
A € #, we write 

P(A|G)= al FY) 


and call it the “conditional probability of A relative to 4”. Specifically, P(A | 
4) is any one of the equivalence class of r.v.’s belonging to & and satisfying 
the condition 


(7) VA EG: PAN A)= i P(A | F) dF. 
A 


It follows from the definition that for an integrable r._v. Y and a Borel 
subfield 4, we have 


fo —&Y | #JdP7 =), 
A 
for every A € %, and consequently also 
é{[Y — &(¥ | g)jZ} =0 
for every bounded Z € & (why?). This implies the decomposition: 


Y=Y'+Y" where Y'=(Y|#) and Y" LG, 
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where “Y” | &” means that é(Y”Z) = 0 for every bounded Z € &. In the 
language of Banach space, Y’ is the “projection” of Y on & and Y” its “orthog- 
onal complement”. 

For the Borel field ¥ {X} generated by the r.v. X, we write also &(Y | X) 
for €(Y | #{X}); similarly for &(Y | X,...,X,). The next theorem clarifies 
certain useful connections. 


Theorem 9.1.2. One version of the conditional expectation &(Y | X) is given 
by ~(X), where gy is a Borel measurable function on &!. Furthermore, if we 
define the signed measure A on Z! by 


VB € B':(B) = / Y dP, 
x7!(B) 
and the p.m. of X by y, then ¢ is one version of the Radon—Nikodym deriva- 
tive dA/d. 


PROOF. The first assertion of the theorem is a particular case of the 
following lemma. 


Lemma. If Z € ¥{X}, then Z = g(X) for some extended-valued Borel 
measurable function ¢. 


PROOF OF THE LEMMA. It is sufficient to prove this for a bounded positive Z 
(why?). Then there exists a sequence of simple functions Z,, which increases 
to Z everywhere, and each Z,, is of the form 


£ 
) Cyl, 
j=! 


where A; € #{X}. Hence A; = X~!(B;) for some B; € ZB! (see Exercise 11 
of Sec. 3.1). Thus if we take 


£ 
Pm = S Cj lp, 
jal 


we have Zp, = Om (X). Since @m(X) > Z, it follows that g,, converges on the 
range of X. But this range need not be Borel or even Lebesgue measurable 
(Exercise 6 of Sec. 3.1). To overcome this nuisance, we put, 


vx € RB: g(x) = lim gn (x). 
m->OoO 
Then Z = lim, G(X) = v(X), and g is Borel measurable, proving the lemma. 


To prove the second assertion: given any B € @', let A = X7~'(B), then 
by Theorem 3.2.2 we have 


| EY |X) dP = | Ip(X)o(X) dP? = | Ip(x)o(x) dp = / lx) dy. 
A Q Ri B 
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Hence by (6), 
A(B) = | YadP= i g(x) du. 
A B 
This being true for every B in Z?, it follows that gy is a version of the derivative 
dd/du. Theorem 9.1.2 is proved. 


As a consequence of the theorem, the function &(Y | X) of w is constant 
a.e. on each set on which X(w) is constant. By an abuse of notation, the v(x) 
above is sometimes written as é(Y | X = x). We may then write, for example, 
for each real c: 


| yar = | EY |X =x)dPAX <x}. 
{X<c} (—00,c] 


Generalization to a finite number of X’s is straightforward. Thus one 


version of &(Y | Xj,...,Xn) iS g(X1,...,Xn), where g is an n-dimensional 
Borel measurable function, and by é(Y |X, =x,...,X, =xX,) iS meant 
ON ao04 Xp )- 


It is worthwhile to point out the extreme cases of &(Y | ¥): 
E&Y |f/)=€Y), EV|FI=Y; a.e. 


where f is the trivial field {@, Q}. If & is the field generated by one set 
A :{@, A, A, Q}, then &(Y | ¥) is equal to &(Y | A) on A and &(Y | A‘) 
on A‘. All these equations, as hereafter, are between equivalent classes of 
I.V.’S. 

We shall suppose the pair (F, 7) to be complete and each Borel subfield 
G of F to be augmented (see Exercise 20 of Sec. 2.2). But even if & is 
not augmented and & is its augmentation, it follows from the definition that 
&(Y | ¥) = &(Y | Y), since an r.v. belonging to & is equal to one belonging 
to § almost everywhere (why ?). Finally, if % is a field generating &, or just 
a collection of sets whose finite disjoint unions form such a field, then the 
validity of (6) for each A in % is sufficient for (6) as it stands. This follows 
easily from Theorem 2.2.3. 

The next result is basic. 


Theorem 9.1.3. Let Y and YZ be integrable r.v.’s and Z € &; then we have 
(8) EYZ|9)=ZEY |) ae. 


(Here “a.e.” is necessary, since we have not stipulated to regard Z as an 
equivalence class of r.v.’s, although conditional expectations are so regarded 
9499 


by definition. Nevertheless we shall sometimes omit such obvious “a.e.’s 
from now on.] 


316 | CONDITIONING. MARKOV PROPERTY. MARTINGALE 


PROOF. As usual we may suppose Y > 0, Z > 0 (see property (ii) below). 
The proof consists in observing that the right member of (8) belongs to & and 
satisfies the defining relation for the left member, namely: 


(9) ves: | ey | aan | zy aP. 
A A 


For (9) is true if Z=1,, where A € &, hence it is true if Z is a simple 
r.v. belonging to & and consequently also for each Z in & by monotone 
convergence, whether the limits are finite or positive infinite. Note that the 
integrability of Zé(Y | #) is part of the assertion of the theorem. 

Recall that when & is generated by a partition {A,,}, we have exhibited a 
specific version (5) of &(Y | #). Now consider the corresponding A(M | #) 
as a function of the pair (M, w): 


PM | Fo) = SPM | An), @). 


For each fixed M, as a function of w this is a specific version of A(M | &). 
For each fixed wo, as a function of M this is a p.m. on ¥ given by P{- | Am} 
for wo € A,,. Let us denote for a moment the family of p.m.’s arising in this 
manner by C(w, -). We have then for each integrable r.v. Y and each wp in Q: 


(10) EY | ¥)(wo) = Soe” | An)Ia,(@0) = i YC(wo, do). 


i Q 


Thus the specific version of &(Y | #) may be evaluated, at each wo € 2, by 
integrating Y with respect to the p.m. C(wo,-). In this case the conditional 
expectation &(- | &) as a functional on integrable r.v.’s is an integral in the 
literal sense. But in general such a representation is impossible (see Doob 
[16, Sec. 1.9]) and we must fall back on the defining relations to deduce its 
properties, usually from the unconditional analogues. Below are some of the 
simplest examples, in which X and X,, are integrable r.v.’s. 


(i) fx € G, then ¢(X | 4) =X ae.; this is true in particular if X is a 
constant a.e. 
Gi) 1 +X2| 9) = &X1 | 97) + &X2 | 9). 
(iii) If X) < Xp, then &(X, | 4) < &(X2 | ¥). 
(iv) |X | 4) < &(XI | 9). 
(v) If X, t X, then (X, |G) t &X |). 
(vi) If X, | X, then &(X, | 9) ) &X 1 9). 
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(vii) If |X,|< Y where ¢(Y) < o and X, > X, then ¢(X, | 4) > 
E(X |G). 


To illustrate, (iii) is proved by observing that for each A € &: 


[em jap=f xar< | xan | E(X2. | o)d7. 
A A A A 


Hence if A = {&(X; | 9) > &(X2| &)}, we have P(A) = 0. The inequality 
(iv) may be proved by (ii), (iii), and the equation X = Xt — X~. To prove 
(v), let the limit of @(X, | #) be Z, which exists a.e. by (iii). Then for each 
A € &, we have by the monotone convergence theorem: 


[zap aim f ecx, | 9)a7 =iim | x,ar= | xar. 
A nods noUA A 


Thus Z satisfies the defining relation for é(X | &), and it belongs to & with 
the €(X, | #)’s, hence Z = &(X | F). 

To appreciate the caution that is necessary in handling conditional expec- 
tations, let us consider the Cauchy—Schwarz inequality: 


E(XY|| GY < €%* | NEW’ | 9). 


If we try to extend one of the usual proofs based on the positiveness of the 
quadratic form in A: &((X +AY)* | 4), the question arises that for each A 
the quantity is defined only up to a null set N,, and the union of these over 
all A cannot be ignored without comment. The reader is advised to think this 
difficulty through to its logical end, and then get out of it by restricting the 
X’s to the rationals. Here is another way out: start from the following trivial 
inequality: 
XIV] Xx ile 
ap ~ 202 262’ 


where a = &(X* | #)!/?, B= &(¥* | G)'/*, and aB>0; apply the operation 
&{— | #} using (ii) and (iii) above to obtain 
s}. 


xX en eG i 2a a 
a eal <-€4 19 +5615 
ap 2 |a? 2 | 6 


Now use Theorem 9.1.3 to infer that this can be reduced to 


|e La? -< 1p 
Wy OA ee ee | 
spilt NS aoe + 3 


a 


the desired inequality. 
The following theorem is a generalization of Jensen’s inequality in Sec. 3.2. 
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Theorem 9.1.4. If g is a convex function on #! and X and g(X) are inte- 
grable r.v.’s, then for each &: 


(11) AEX | 9)) < &(X) | 9). 


PROOF. If X is a simple r.v. taking the values {y,} on the sets {A;}, 1 < 
J <n, which forms a partition of 2, we have 


EX |G) => yj P(A; | 9), 
j=l 
EGX) |S) = S> GO)PA; | 9), 


j=l 


where Vint P(A; | ¥) = 1a. Hence (11) is true in this case by the property 
of convexity. In general let {X,,} be a sequence of simple r.v.’s converging to 
X a.e. and satisfying |X| < |X| for all m (see Exercise 7 of Sec. 3.2). If we 
let m — oo below: 


(12) QE (Xm | 9)) < EGXm) | 9), 


the left-hand member converges to the left-hand member of (11) by the conti- 
nuity of g, but we need dominated convergence on the right-hand side. To get 
this we first consider gy, which is obtained from g by replacing the graph of 
gy outside (—n,n) with tangential lines. Thus for each n there is a constant 
C,, such that 

Vx € B': |n(x)| < Cn(|x] + 1). 


Consequently, we have 
\n(Xm)| < CnCXml +1) < Cr UX| + 1) 


and the last term is integrable by hypothesis. It now follows from property 
(vii) of conditional expectations that 


lim €(Gn(Xm) | 4) = €(@n(X) | 9). 
m—>CO 


This establishes (11) when g is replaced by g,. Letting n — oo we have 
Yn t g and @,(X) is integrable; hence (11) follows for a general convex 9¢, 
by monotone convergence (Vv). 

Here is an alternative proof, slightly more elegant and more delicate. We 
have for any x and y: 


p(x) — o(y) = g'(y)(« — y) 
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where g’ is the right-hand derivative of y. Hence 
G(X) — PEK | 9)) = GCE | AMX — EX | 4). 


The right member may not be integrable; but let A = {w:|¢(X | %)| < A} 
for A>0. Replace X by X1, in the above, take expectations of both sides, and 
let A t oo. Observe that 


E{(X1A) | 9} = El) a + POM ae | F} = OX) | 1a + POM ae. 


We now come to the most important property of conditional expectation 
relating to changing fields. Note that when A = &, the defining relation (6) 
may be written as 


Eg(Eg(Y)) = & (VY) = €o (Ey (Y)). 
This has an immediate generalization. 


Theorem 9.1.5. If Y is integrable and A Cc A, then 


(13) éz(Y) = €%(Y) if and only if é%(Y) € A; 
and 
(14) EF(ER(Y)) = ZY) = €g (EF (Y)). 


PROOF. Since Y satisfies trivially the defining relation for (Y | A), it 
will be equal to the latter if and only if Y € A. Now if we replace our basic 
F by A and Y by ¢&%(Y), the assertion (13) ensues. Next, since 


ix YEA CA, 


the second equation in (14) follows from the same observation. It remains 
to prove the first equation in (14). Let A cA, then A € A; applying the 
defining relation twice, we obtain 


[ cxexernar = | mar = | Y dP. 
A A A 


Hence ¢%(&%(Y)) satisfies the defining relation for &%(Y); since it belongs 
to A, it is equal to the latter. 
As a particular case, we note, for example, 


(15) E{EY | X1,X2) | Xi} = EW |X) = EW | X1) | Xr, Xd}. 


To understand the meaning of this formula, we may think of X; and X> as 
discrete, each producing a countable partition. The superimposition of both 
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partitions yields sets of the form {A;M My,}. The “inner” expectation on the 
left of (15) is the result of replacing Y by its “average” over each A; 1 Mx. 
Now if we replace this average r.v. again by its average over each A, then 
the result is the same as if we had simply replaced Y by its average over each 
A ;. The second equation has a similar interpretation. 

Another kind of simple situation is afforded by the probability triple 


(7/", ZB", m") discussed in Example 2 of Sec. 3.3. Let x1, ..., Xn, be the coor- 
dinate r.v.’s eee = f(x},...,X,), where f is (Borel) measurable and inte- 


grable. It is easy to see that for1<k<n—1l, 


1 ] 
fy Iam) = f af FX, 66 Xn) Opa +++ Xn, 
0 


while for k = n, the left side is just y (a.e.). Thus, taking conditional expec- 
tation with respect to certain coordinate r.v.’s here amounts to integrating out 
the other r.v.’s. The first equation in (15) in this case merely asserts the possi- 
bility of iterated integration, while the second reduces to a banality, which we 
leave to the reader to write out. 


EXERCISES 


1. Prove Lemma 2 in Sec. 7.2 by using conditional probabilities. 

2. Let {A,} be a countable measurable partition of Q, and E ¢ ¥ with 
P(E)>0; then we have for each m: 

P(Am)Pr, (E 
Petr, = Aman ED 
VL PA(An)Pa, ) 
[This is Bayes’ rule.] 
*3,. If X is an integrable r.v., Y a bounded r.v., and & a Borel subfield, 


then we have 
E{E(X | SY} = &{XEY | F)}. 


4. Prove Fatou’s lemma and Lebesgue’s dominated convergence theorem 
for conditional expectations. 

*5. Give an example where &(é&(Y | X1)|X2) 4 &(E&(Y | X2) | X1). 
[HINT: It is sufficient to give an example where ¢(X | Y) A &{E(X | Y) | X}; 
consider an 2 with three points.] 

*6. Prove that 07(4;(Y)) < o7(Y), where o” is the variance. 

7. If the random vector has the probability density function p(-, -) and 
X is integrable, then one version of &(X | X + Y =z) is given by 


[xz =xax/ | ptxz — x)dx. 
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*§. In the case above, there exists an integrable function g(., -) with the 
property that for each B € Z', 


i p(x, y)dy 
B 


is a version of A{Y € B| X =x}. [This is called a “conditional density func- 
tion in the wide sense” and 


n 
D(x, 1) = i: g(x, y)dy 
—C 
is the corresponding conditional distribution in the wide sense: 
D(x, n= AY < |X = 4}. 


The term “wide sense” refers to the fact that these are functions on FR rather 
than on Q; see Doob [1, Sec. 1.9].] 


9. Let the p(-, -) above be the 2-dimensional normal density: 


1 1 x? 2pxy y 
Se OP oa ee 
210102,/1 — p 2(1 — p*) \o; 0102 04 

where 0; >0, o2>0, 0 < p < 1. Find the g mentioned in Exercise 8 and 


lo. ¢) 

/ yo(x, y) dy. 

~—oo 

The latter should be a version of &(Y | X = x); verify it. 
10. Let be a B.F., X and Y twor.v.’s such that 


&(¥? | #) =X’, &(Y |G) =X. 


Then Y =X ae. 
11. As in Exercise 10 but suppose now for any f € Cx; 


E{X? | FX} = &{¥? | F&O); EX | f(X)} = AY | FX). 


Then Y =X ae. [Hmnt: By a monotone class theorem the equations hold for 
f =1,, B € Z'; now apply Exercise 10 with 4 = F {X}.] 

12. Recall that X, in L! converges weakly in L' to X iff (X,Y) > 
&(XY) for every bounded r.v. Y. Prove that this implies £(X, | #) converges 
weakly in L' to &(X | #) for any Borel subfield 4 of F. 

*13. Let S be an rv. such that A{S > t} =e, t>0. Compute &{S | SAt} 
and ¢{S | S vt} for each t > 0. 
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9.2 Conditional independence; Markov property 


In this section we shall first apply the concept of conditioning to independent 
r.v.’s and to random walks, the two basic types of stochastic processes that 
have been extensively studied in this book; then we shall generalize to a 
Markov process. Another important generalization, the martingale theory, will 
be taken up in the next three sections. 

All the B.F.’s (Borel fields) below will be subfields of ¥. The B.F.’s 
{%, a € A}, where A is an arbitrary index set, are said to be conditionally inde- 
pendent relative to the B.F. &, iff for any finite collection of sets A;,..., A, 
such that A; € %, and the a;’s are distinct indices from A, we have 


PY (AF) =] [Pas 1. 
j=l j=l 


When & is the trivial B.F., this reduces to unconditional independence. 
Theorem 9.2.1. For each a € A let ¥™ denote the smallest B.F. containing 


all zg, B € A — {a}. Then the &%’s are conditionally independent relative to 
& if and only if for each a and Ay € & we have 


Pha | FOV G) = Pha | 9); 
where ¥“ v & denotes the smallest B.F. containing ¥™ and g. 


PROOF. It is sufficient to prove this for two B.F.’s ® and A, since the 
general result follows by induction (how?). Suppose then that for each A € 4% 
we have 


(1) PA|AVG)=P(A|Y). 
Let M € &, then 
P(AM | 4) = E{PAM|AVGI)|G = &{PAPAl|AV GI) In| 9} 
=AP(A|9)Im 19} = PA 9)PM | 9), 


where the first equation follows from Theorem 9.1.5, the second and fourth 
from Theorem 9.1.3, and the third from (1). Thus 4 and & are conditionally 
independent relative to &. Conversely, suppose the latter assertion is true, then 


AP(A|G)1m |G} = 7A |HPM | F) 
— PAM | 9) = APAA|AYV Im | 9}, 
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where the first equation follows from Theorem 9.1.3, the second by hypothesis, 
and the third as shown above. Hence for every A € &, we have 


/ PK | G)dP = | PK | AN G)dP = 7(AMA) 
MA MA 


It follows from Theorem 2.1.2 (take “% to be finite disjoint unions of sets like 
MA) or more quickly from Exercise 10 of Sec. 2.1 that this remains true if 
MA is replaced by any set in 4A V &. The resulting equation implies (1), and 
the theorem is proved. 


When & is trivial and each % is generated by a single r.v., we have the 
following corollary. 


Corollary. Let {X,, @ € A} be an arbitrary set of r.v.’s. For each a let ¥™ 
denote the Borel field generated by all the r.v.’s in the set except X,. Then 
the X,,’s are independent if and only if: for each w and each B € #!, we have 


PIX, EB|F™} = PlXq € B} a.e. 


An equivalent form of the corollary is as follows: for each integrable r.v. 
Y belonging to the Borel field generated by X,, we have 


(2) ELY | F™)} = E{Y}. 


This is left as an exercise. 

Roughly speaking, independence among r.v.’s is equivalent to the lack 
of effect by conditioning relative to one another. However, such intuitive 
statements must be treated with caution, as shown by the following example 
which will be needed later. 

If A, A, and & are three Borel fields such that ~ Vv “A is independent 
of &, then for each integrable X € “4, we have 


(3) MX | AN AS = E{X | A}. 


Instead of a direct verification, which is left to the reader, it is interesting 
to deduce this from Theorem 9.2.1 by proving the following proposition. 

If A V% is independent of %, then 4 and # are conditionally inde- 
pendent relative to A. 

To see this, let Ay € A”, A3 € &. Since 


P(N AAs) = P(A, Ag) P(A) = / PA | BPs) dP 


Ap 
for every Az € A, we have 
P(A A3 | A) = PA1 | AYP(A3) = MAY | A)P(A3 | A), 


which proves the proposition. 
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Next we ask: if X,; and X2 are independent, what is the effect of condi- 
tioning X; + X2 by X;? 


Theorem 9.2.2. Let X; and X> be independent r.v.’s with p.m.’s ju; and 2; 
then for each B € #?: 


(4) P{X, +X. © B| X1} = wo(B—X)) a.e. 

More generally, if {X,,, > 1} is a sequence of independent r.v.’s with p.m.’s 
{n,n > 1}, and S, = )_, Xj, then for each B € B': 

(5) AtSn EB | Sty tes Sn—1} = Ln(B = Sn—i) a PiSn eB | Sn—1} a.c. 


PROOF. To prove (4), since its right member belongs to ¥ {Xj}, it 
is sufficient to verify that it satisfies the defining relation for its left 
member. Let A € ¥{X;}, then A =X fe (A) for some A € %!. It follows from 
Theorem 3.2.2 that 


fh y9(B — Xy) dP = [ume eC) 


Writing 4 = 41 X {2 and applying Fubini’s theorem to the right side above, 
then using Theorem 3.2.3, we obtain 


pita) eee i / Masi tls 
A x1, +22EB 


= I. , AP = P{X, E€A;X, +X € B}. 
xy ae 
This establishes (4). 
To prove (5), we begin by observing that the second equation has just been 


proved. Next we observe that since {X;,...,X,} and {S,..., S,} obviously 
generate the same Borel field, the left member of (5) is just 

P{Sn =B | Xi, oe Sj) ped 
Now it is trivial that as a function of (X},...,Xn—1), Sy, “depends on them 
only through their sum S,,_,”. It thus appears obvious that the first term in (5) 
should depend only on S,_;, namely belong to 7 {S,,_1} (rather than the larger 
#{S;,...,Sn—1}). Hence the equality of the first and third terms in (5) should 


be a consequence of the assertion (13) in Theorem 9.1.5. This argument, 
however, is not rigorous, and requires the following formal substantiation. 


Let) = wy X++* X py = WE") & po, and 
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where B,, = B. Sets of the form of A generate the Borel field ¥ {S,,..., S,—1}. 
It is therefore sufficient to verify that for each such A, we have 


i: in(By — Sp1)dP = PAA; Sy € Bn}. 
A 


n 


If we proceed as before and write s, = >> j=i%j> the left side above is equal to 


Len(Bn — Sn—1) ew" (dxy, wees Xn-1) 
s;€B,;,1<jsn—-1 
n 
= Je w"(dxy,..., d%n)=P% ( \IS; €Bil?. 
j=l 


sj€Bj,l<jsn 


as was to be shown. In the first equation above, we have made use of the fact 
that the set of (x},...,X,) in &” for which x, € B, — S,—1 is exactly the set 
for which s, € B,, which is the formal counterpart of the heuristic argument 
above. The theorem is proved. 


The fact of the equality of the two extreme terms in (5) is a fundamental 
property of the sequence {S,}. We have indeed made frequent use of this 
in the study of sums of independent r.v.’s, particularly for a random walk 
(Chapter 8), although no explicit mention has been made of it. It would be 
instructive for the reader to review the material there and locate some instances 
of its application. As an example, we prove the following proposition, where 
the intuitive picture of conditioning is particularly clear. 


Theorem 9.2.3. Let {X,,n > 1} be an independent (but not necessarily 
stationary) process such that for A>O there exists 6>0 satisfying 


inf P{X, > A} > 6. 


Then we have 
Wn > 1:A{S; € (0, A] for 1 <j <n} < (1—3)". 
Furthermore, given any finite interval J, there exists an «>0 such that 
AIS; El, forl<j<n}<(1-€)’. 
PROOF. We write A, for the event that S; € (0, A] for 1 < j <n; then 
P{ Mn} = P{An-1350 < Sn SA}. 


By the definition of conditional probability and (5), the last-written probability 
is equal to 
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f FAO < Sy <A|Sq,.--,Sn-1} dP 
An-} 


= | [Fn(A — Sn-1) — Fn (0 — Sy) dF, 
An-1 


where F,, is the df. of u,. [A quirk of notation forbids us to write the 
integrand on the right as P{—Sp,—-1 < X, <A —Sy,-1}!] Now for each wo in 
An—1; Sn—-1(@o)>0, hence 


F(A — Sp-1(@0)) < P{Xn < A} = 1-6 
by hypothesis. It follows that 


Ply) < [1 8)dP = (1 - BYP {Anat} 
An-1 

and the first assertion of the theorem follows by iteration. The second is proved 

similarly by observing that P{X, +--+ +Xnim-1 = mA} > 6” and choosing 

m so that mA exceeds the length of J. The details are left to the reader. 


Let N° = {0} UN denote the set of positive integers. For a given sequence 
of r.v.’s {X,,n € N°} let us denote by & the Borel field generated by {X,,n € 
I}, where I is a subset of N°, such as [0, n], (n, 00), or {n}. Thus An), Aon}; 
and Anoo) have been denoted earlier by ¥ {Xp}, F,, and ¥',, respectively. 


DEFINITION OF MARKOV PROCESS. The sequence of r.v.’s {Xn,n € N°} is 
said to be a Markov process or to possess the “Markov property” iff for 
every n € N° and every B € #', we have 


(6) PAX na) € B|Xo,.--.Xn} =PAlXngi €B| Xn}. 


This property may be verbally announced as: the conditional distribution 
(in the wide sense!) of each r.v. relative to all the preceding ones is the same 
as that relative to the last preceding one. Thus if {X,} is an independent 
process as defined in Chapter 8, then both the process itself and the process 
of the successive partial sums {S,,} are Markov processes by Theorems 9.2.1 
and 9.2.2. The latter category includes random walk as a particular case; note 
that in this case our notation X,, rather than S, differs from that employed in 
Chapter 8. 

Equation (6) is equivalent to the apparently stronger proposition: for 
every integrable Y € An41}, we have 


(6’) E{Y | Xq,..., Xn} = ELV | Xn}. 


It is clear that (6’) implies (6). To see the converse, let Y,, be a sequence of 
simple r.v.’s belonging to 4,41) and increasing to Y. By (6) and property (ii) 
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of conditional expectation in Sec. 9.1, (6’) is true when Y is replaced by Yin; 
hence by property (v) there, it is also true for Y. 
The following remark will be repeatedly used. If Y and Z are integrable, 


Z € A, and 
[vane | zaz 
A A 


() x77 B)). 


jelo 


for each A of the form 


where Jo is an arbitrary finite subset of J and each B; is an arbitrary Borel set, 
then Z = &(Y | A). This follows from the uniqueness of conditional expec- 
tation and a previous remark given before Theorem 9.1.3, since finite disjoint 
unions of sets of this form generate a field that generates *. 

If the index n is regarded as a discrete time parameter, as is usual in 
the theory of stochastic processes, then Apo,n) is the field of “the past and the 
present”, while Ayo) is that of “the future”; whether the present is adjoined 
to the past or the future is often a matter of convenience. The Markov property 
just defined may be further characterized as follows. 


Theorem 9.2.4. The Markov property is equivalent to either one of the two 
propositions below: 


(7) Wn EN,M € An 00): PM | Ao,ny} = A{M | Xn}. 
(8) Wn EN, My € Aon}, M2 € An,ooy: P{M1M2 | Xn} 


= PAMy | Xn}AP{M2 | Xn}. 
These conclusions remain true if A, ,o0) is replaced by An oo). 
PROOF. To prove (7) implies (8), let ¥; = 1m,,7 = 1, 2. We then have 

(9) P{MY | Xn} P{Ma | Xn} = {V1 | Xn} e{V2 | Xn} 

= &{¥1E(Y2 | Xn) | Xn} 

= &{¥1E(Y2 | Aon) | Xn} 

= &{E(Y1Y2 | Aon) | Xn} 

= &{Y1¥2 | Xn} = 7{MiM2 | Xn}, 
where the second and fourth equations follow from Theorems 9.1.3, the third 
from assumption (7), and the fifth from Theorem 9.1.5. 


Conversely, to prove that (8) implies (7), let A € An}, Mi € Aron), M2 € 
Fn.oo). By the second equation in (9) applied to the fourth equation below, 
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we have 


/ Ma |X,)aP = | EWa|Xn)dP = | YEW Xn) dP 
AM, AM, A 
= E(Y1 (V2 | Xn) |X) dP 
A 
= | EY 1 |Xn)EW2 | Xn) dP 
A 
= 7 PM | Xn JPM | Xn) dP 
A 
A 


Since disjoint unions of sets of the form AM, as specified above generate the 
Borel field Ao,n}, the uniqueness of PCM? | Ao,n}) shows that it is equal to 
P (M2 | Xn), proving (7). 

Finally, we prove the equivalence of the Markov property and the propo- 
sition (7). Clearly the former is implied by the latter; to prove the converse 
we shall operate with conditional expectations instead of probabilities and use 
induction. Suppose that it has been shown that for every n € N, and every 
bounded f belonging to An+inisj, we have 


(10) ECF | Fon) = &F | Any): 


This is true for k = 1 by (6’). Let g be bounded, g € An+in+e+1]3 We are going 
to show that (10) remains true when f is replaced by g. For this purpose it 
is sufficient to consider a g of the form gigo, where g1 € An+in+k), 82 € 
An+k+1}, both bounded. The successive steps, in slow motion fashion, are as 
follows: 


E{g | Fon} = AEC | Fonse) | Aon} = &(g1 €(g2 | Aon+n) | Ao,ni} 
= &{g1&(g2 | Ansa) | Aon} = {81 €(g2 | An+ey) | Any} 
= &{g1€(g2 | Annse) | Any} = AE(8182 | Annte) | Any} 

€{8182 | Any} = &lg | Any}- 


It is left to the reader to scrutinize each equation carefully for explanation, 
except that the fifth one is an immediate consequence of the Markov property 
E{go | Anew} = &{82 | Ao.n+e} and (13) of Sec. 9.1. This establishes (7) for 
Me ae Ann+k, Which is a field generating An,oo). Hence (7) is true (why?). 
The last assertion of the theorem is left as an exercise. 
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The property embodied in (8) may be announced as follows: “The past 
and the future are conditionally independent given the present”. In this form 
there is a Symmetry that is not apparent in the other equivalent forms. 

The Markov property has an extension known as the “strong Markov 
property”. In the case discussed here, where the time index is N®, it is an 
automatic consequence of the ordinary Markov property, but it is of great 
conceptual importance in applications. We recall the notions of an optional 
r.v. a, the r.v. X,, and the fields A and ¥’,, which are defined for a general 
sequence of r.v.’s in Sec. 8.2. Note that here a may take the value 0, which 
is included in N°. We shall give the extension in the form (7). 


Theorem 9.2.5. Let {X,,n € N°} be a Markov process and a a finite 
optional r.v. relative to it. Then for each M € #’y we have 


(L1) P{M | Aj = AIM | a, Xa}. 


PROOF. Since a € % and X, €% (Exercise 2 of Sec. 8.2), the right 
member above belongs to 4. To prove (11) it is then sufficient to verify 
that the right member satisfies the defining relation for the left, when M is of 
the form 


£ 
(\Xorj EB), BeeBil<sjshlsl<ov. 
j=l 

Put for each n, 


£ 
M,, = (Xn) E Bj} € An,oo): 
j= 


Now the crucial step is to show that _ 

oO 
(12) SO PIMn | Xn} joan) = PM | ot, Xa}. 

n=Q 
By the lemma in the proof of Theorem 9.1.2, there exists a Bore] measurable 
function , such that 7{M, | X,} = @n(Xn), from which it follows that the 
left member of (12) belongs to the Borel field generated by the two-r.v.’s @ 
and X,. Hence we shall prove (12) by verifying that its left member satisfies 
the defining relation for its right member, as follows. For each m € N and 


Be #', we have 
CO 


/ a P{M,, | Xn} lte=n} df = PIM | Xm} dp 
{a=m:X, €B} 


w= {a=n0:X,€B} 


| PLM in | Ao,m} dp = Pa =m;Xm € ByMn} 
{a=m:X EB} 


Pla = m:Xq € B:M}, 
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where the second equation follows from an application of (7), and the third 
from the optionality of a, namely 
{a = m} € Ao,m) 


This establishes (12). 
Now let A € %, then [cf. (3) of Sec. 8.2] we have 


A=U{@=n)n Ay}, 


n=0 
where A,, € Ao, - It follows that 


P{AM} = oe =n; AniMn} = = a P{Mn | Fo.m} dP 


n=0 (a=n)NA, 
=> [7 My | Xn} Laxn) dP = [ 70 a Xa a7. 
n=0 


where the third equation is by an application of (7) while the fourth is by (12). 
This being true for each A, we obtain (11). The theorem is proved. 


When a is a constant n, it may be omitted from the right member of 
(11), and so (11) includes (7) as a particular case. It may be omitted also in 
the homogeneous case discussed below (because then the g, above may be 
chosen to be independent of 7). 

There is a very general method of constructing a Markov process with 
given “transition probability functions”, as follows. Let Po(-) be an arbitrary 
p-m. on (# 1 Ff). For each n > 1 let P,,(-, -) be a function of the pair (x, B) 
where x € #! and B € 4’, having the measurability properties below: 


(a) for each x, P,, (x, -) iS a p.m. on PB: 
(b) for each B, P, (, B) € Z!. 


It is a consequence of Kolmogorov’s extension theorem (see Sec. 3.3) that 
there exists a sequence of r.v.’s {X,, € N°} on some probability space 
with the. following “finite-dimensional joint distributions”: for each 0 < 2 < 
0, Be B',0<j<n: 


DP (\ix; € By] 7, Po(dxo) | P,(xo, @x1) 
Boy By 


j=0 


(13) Sue / Pie ade): 


n 


There is no difficulty in verifying that (13) yields an (n + 1)-dimensional p.m. 
on (2"+1, 4"+1) and so also on each subspace by setting some of the B;’s 


9.2 CONDITIONAL INDEPENDENCE; MARKOV PROPERTY | 331 


above to be #!, and that the resulting collection of p.m.’s are mutually consis- 
tent in the obvious sense [see (16) of Sec. 3.3]. But the actual construction of 
the process {X,,n € N°} with these given “marginal distributions” is omitted 
here, the procedure being similar to but somewhat more sophisticated than that 
given in Theorem 3.3.4, which is a particular case. Assuming the existence, 
we will now show that it has the Markov property by verifying (6) briefly. 
By Theorem 9.1.5, it will be sufficient to show that one version of the left 
member of (6) is given by P,4;(Xn, B), which belongs to 4, by condition 
(b) above. Let then 


A = ( \[X; € By] 
j=0 


and p+) be the (n+1)-dimensional p.m. of the random vector 
(Xo,...,Xn). It follows from Theorem 3.2.3 and (13) used twice that 


/ Pr+iXn, B)dP = / oe / PriiQn, Bod”? 
A 


Box: xBn 
n+l 
= fof Poca) TP Pier ax) 
Box-XBy xB j=l 


= P(A: Xn41 € B). 


This is what was to be shown. 

We call Po(-) the initial distribution of the Markov process and P,,(-, -) 
its ‘“nth-stage transition probability function”. The case where the latter is the 
same for all n > 1 is particularly important, and the corresponding Markov 
process is said to be “(temporally) homogeneous” or “with stationary transition 
probabilities”. In this case we write, with x = x9: 


n-1 
(14) P(x, B) = J fT] PQ; dxjsu), 
Rx xR! xB J=0 
and call it the “n-step transition probability function”; when n = 1, the qual- 
ifier “1-step” is usually dropped. We also put P© (x, B) = 1,(x). It is easy to 
see that 


(15) p+), B) = | Py, BYP (x, dy), 
RI 


so that all P) are just the iterates of P"?. 
It follows from Theorem 9.2.2 that for the Markov process {S,,n € N} 
there, we have 
P,, (x, B) = Un (B — x). 
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In particular, a random walk is a homogeneous Markov process with the 1-step 
transition probability function 


P(x, B) = u(B— x). 


In the homogeneous case Theorem 9.2.4 may be sharpened to read like 
Theorem 8.2.2, which becomes then a particular case. The proof is left as an 
exercise. 


Theorem 9.2.6. For a homogeneous Markov process and a finite r.v. a which 
is optional relative to the process, the pre-a and post-a fields are conditionally 
independent relative to X,, namely: 


VA €ER,MEF:A{AM | Xa} = A{A | Xa} AIM | Xa}. 


Furthermore, the post-a process {Xqin,,n € N} is a homogeneous Markov 
process with the same transition probability function as the original one. 


Given a Markov process {X,,n € N°}, the distribution of Xo is a p.m. 
and for each B € @!, there is according to Theorem 9.1.2 a Borel measurable 
function ¢,,(-, B) such that 


P{Xn41 €B| Xn = x} = Gn (x, B). 


It seems plausible that the function ¢,(-,-) would correspond to the n-stage 
transition probability function just discussed. The trouble is that while condi- 
tion (b) above may be satisfied for each B by a particular choice of ¢,(-, B), it 
is by no means clear why the resulting collection for varying B would satisfy 
condition (a). Although it is possible to ensure this by means of conditional 
distributions in the wide sense alluded to in Exercise 8 of Sec. 9.1, we shall 
not discuss it here (see Doob [16, chap. 2]). 

The theory of Markov processes is the most highly developed branch 
of stochastic processes. Special cases such as Markov chains, diffusion, and 
processes with independent increments have been treated in many mono- 
graphs, a few of which are listed in the Bibliography at the end of the book. 


EXERCISES 


*1. Prove that the Markov property is also equivalent to the following 
proposition : if t] <--- <t, <t,41 are indices in N° and Bj,1<j<n+l, 
are Borel sets, then 


PIX tra; E Bn | Xr, 12> Xp} = P{X,, € Bn+1 | Xz, }. 


In this form we can define a Markov process {X,} with a continuous parameter 
t ranging in [0, oo). 
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2. Does the Markov property imply the following (notation as in 
Exercise 1): 


PA{Xn41 © Bna1 | X1 © Bi,...,Xn € Ba} = P{Xn41 © Bugs | Xn € By}? 


3. Unlike independence, the Markov property is not necessarily preserved 
by a “functional”: {f(X,), n € N°}. Give an example of this, but show that it 
is preserved by each one-to-one Borel measurable mapping ff. 


*4, Prove the strong Markov property in the form of (8). 
5. Prove Theorem 9.2.6. 


*6. For the P defined in (14), prove the “Chapman—Kolmogorov 
equations”: 


Vm €N,n © N: P™*)(x, B) = | P™ (x, dy)P™ (y, B). 
Fl 


7. Generalize the Chapman—Kolmogorov equation in the nonhomo- 
geneous case. 


*8. For the homogeneous Markov process constructed in the text, show 
that for each f > 0 we have 


Ef Xnan) |Xm} = 7. FOP Km, 4). 


*9, Let B be a Borel set, f(x, B) = P(x, B), and define f, for n > 2 
inductively by 


$6 B= | PC, dy) fn 10, BY 


put f(x,B)= 3°", fn(x,B). Prove that f(X,,B) is a version of the 
conditional probability FAT 411X; € B] | Xn} for the homogeneous Markov 
process with transition probability function P(., -). 


*10. Using the f defined in Exercise 9, put 
©o 
a, B) = fe,B)— > f PG, dy) ~ £0.B). 
n=] 8 
Prove that g(X,,, B) is a version of the conditional probability 


P lim sup[X ; € B] | Xn}. 
J 


*11. Suppose that for a homogeneous Markov process the initial distri- 
bution has support in N° as a subset of “}, and that for each i € N°, the 
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transition probability function P(i, -), also has support in N°. Thus 
(PG, Js @ J) € N° x N°) 


is an infinite matrix called the “transition matrix”. Show that P) as a matrix 
is just the nth power of P“). Express the probability A{X,, = iz, 1 <k <n} 
in terms of the elements of these matrices. [This is the case of homogeneous 
Markov chains.] 

12. A process {X,,n € N°} is said to possess the “rth-order Markov 
property”, where r > 1, iff (6) is replaced by 


PAN GASB. Kgs veg NV SP ya SB esa eal 


for n > r — 1. Show that if r < s, then the rth-order Markov property implies 
the sth. The ordinary Markov property is the case r = 1. 

13. Let Y, be the random vector (X,,Xn41,---,Xn+r-1). Then the 
vector process {Y,,n € N°} has the ordinary Markov property (trivially 
generalized to vectors) if and only if {X,,n € N°} has the rth-order Markov 
property. 

14. Let {X,,n € N°} be an independent process. Let 


n n 
@) _ S (+1) : (r) 
Ss, —_ xy, Sy = Sj 
j=0 j=0 


for r > 1. Then {S“, n € N°} has the rth-order Markov property. For r = 2, 


give an example to show that it need not be a Markov process. 


15. If {S,,m € N} is a random walk such that A{S,; 4 0}>0, then for 
any finite interval [a, b] there exists an € < 1 such that 


AA{S; €[a,b],l1<j<nj}<e". 


This is just Exercise 6 of Sec. 5.5 again.] 

16. The same conclusion is true if the random walk above is replaced 
by a homogeneous Markov process for which, e.g., there exist 6>0 and n>0 
such that P(x, #! — (x — 8,x + 6)) > n for every x. 


9.3 Basic properties of smartingales 
The sequence of sums of independent r.v.’s has motivated the generalization 


to a Markov process in the preceding section; in another direction it will now 
motivate a martingale. Changing our previous notation to conform with later 
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usage, let {x,, € N} denote independent r.v.’s with mean zero and write 
Aig = a= x; for the partial sum. Then we have 


E(Xn4+1 | x1, wees Xn) = EK + Xn41 | x1, | 
= Xe ral es [X4, eee My) = Xp F € (Xn41) =X). 


Note that the conditioning with respect to x;,...,x, may be replaced by 
conditioning with respect to X1,...,X, (why?). Historically, the equation 
above led to the consideration of dependent r.v.’s {x,} satisfying the condition 


(1) é (Xn41 Nisei e eu Ss 


It is astonishing that this simple property should delineate such a useful class 
of stochastic processes which will now be introduced. In what follows, where 
the index set for n is not specified, it is understood to be either N or some 
initial segment N,, of N. 


DEFINITION OF MARTINGALE. The sequence of r.v.’s and B.F.’s {X,, A} is 
called a martingale iff we have for each n: 


(a) A C Agi and X, € AH; 
(b) €(Xn|) < 00; 
(C) Xn = €(Xn41 |A,), ae. 


It is called a supermartingale iff the “=” in (c) above is replaced by “>”, and 
a submartingale iff it is replaced by “<”. For abbreviation we shall use the 
term smartingale to cover all three varieties. In case 4%, = Aj,n) a8 defined in 
Sec. 9.2, we shall omit 4%, and write simply {X,,}; more frequently however 
we shall consider {-%,} as given in advance and omitted from the notation. 

Condition (a) is nowadays referred to as: {X,,} is adapted to {%,}. Condi- 
tion (b) says that all the r.v.’s are integrable; we shall have to impose stronger 
conditions to obtain most of our results. A particularly important one is the 
uniform integrability of the sequence {X,,}, which is discussed in Sec. 4.5. A 
weaker condition is given by 


(2) sup &(|Xn|) < 09; 


n 


when this is satisfied we shall say that {X,,} is L'-bounded. Condition (c) leads 
at once to the more general relation: 


(3) n<m> Xn — EX m | An). 
This follows from Theorem 9.1.5 by induction since 


E(Xm | Fy) = EEX m | Fn—1) | Fy) = O(Xm-1 | Fy). 
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An equivalent form of (3) is as follows: for each A € 4% and n < m, we have 


(4) | X, dP = / X, dP. 
A A 


It is often safer to use the explicit formula (4) rather than (3), because condi- 
tional expectations can be slippery things to handle. We shall refer to (3) or 
(4) as the defining relation of a martingale; similarly for the “super” and “sub” 
Varieties. 

Let us observe that in the form (3) or (4), the definition of a smartingale 
is meaningful if the index set N is replaced by any linearly ordered set, with 
“<<” as the strict order. For instance, it may be an interval or the set of 
rational numbers in the interval. But even if we confine ourselves to a discrete 
parameter (as we shall do) there are other index sets to be considered below. 

It is scarcely worth mentioning that {X,,} is a supermartingale if and 
only if {—X,,} is a submartingale, and that a martingale is both. However the 
extension of results from a martingale to a smartingale is not always trivial, nor 
is it done for the sheer pleasure of generalization. For it is clear that martingales 
are harder to come by than the other varieties. As between the super and sub 
cases, though we can pass from one to the other by simply changing signs, 
our force of habit may influence the choice. The next proposition is a case in 
point. 


Theorem 9.3.1. Let {X,,4,} be a submartingale and let g be an increasing 
convex function defined on &!. If v(X,) is integrable for every n, then 
{g(X,,), A} is also a submartingale. 


PROOF. Since ¢ is increasing, and 
Xn S E{Xn41 | Fr} 
we have 
(5) PXn) S PEXn+1 | A}). 


By Jensen’s inequality (Sec. 9.1), the right member above does not exceed 
&{p(Xn41) | A}; this proves the theorem. As forewarned in 9.1, we have left 
out some “a.e.” above and shall continue to do so. 


Corollary 1. If {X,,.4)} is a submartingale, then so is {X*, 4}. Thus ¢(X+) 
as well as ¢(X,,) iS increasing with n. 


Corollary 2. If {X,,%} is a martingale, then {|X,|,4,} is a submartingale; 
and {|X,|’,.A,}, 1 < p < 00, is a submartingale provided that every X, € L’; 
similarly for {|X,,|log* |X,|, A} where logt x = (log x) V0 for x > 0. 
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PROOF. For a martingale we have equality in (5) for any convex ¢, hence 
we may take v(x) = |x|, |x|? or |x|log* |x| in the proof above. 

Thus for a martingale {X,}, all three transmutations: {X*}, {X7} and 
{|X,,|} are submartingales. For a submartingale {X,,}, nothing is said about the 
last two. 


Corollary 3. If {X,, 4%} is a supermartingale, then so is {X, A A, 4%} where 
A is any constant. 


PROOF. We leave it to the reader to deduce this from the theorem, but 
here is a quick direct proof: 


X, AA> E(Xnat |FH)A AIA) = E(Bngi AAA): 


It is possible to represent any smartingale as a martingale plus or minus 
something special. Let us call a sequence of r.v.’s {Z,,m € N} an increasing 
process iff it satisfies the conditions: 


( i) Z; = 0; Z, <Zn41 for n = 1; 
( ii) €(Z,) < oo for each n. 


It follows that Z,, = lim, ¢ Z, exists but may take the value +00; Zo 
is integrable if and only if {Z,,} is L!-bounded as defined above, which 
means here lim, t &(Zn) < 00. This is also equivalent to the uniform 
integrability of {Z,,} because of (i). We can now state the result as follows. 


Theorem 9.3.2. Any submartingale {X,, 4} can be written as 
(6) X, = Yn + Zn, 
where {Y,,, 4%} is a martingale, and {Z,,} is an increasing process. 


proor. From {X,,} we define its difference sequence as follows: 
(7) =X, X= Xn —Xn-1; n> 2, 


so that X, = yin x;,n > 1 (cf. the notation in the first paragraph of this 
section). The defining relation for a submartingale then becomes 


&{Xy | F,-1} = 0, 


with equality for a martingale. Furthermore, we put 


y=), Yn = Xn E{Xp | F,-1}, Y, = S> yy 


ia 
II 
i) 


fn = E{Xn | Fn—1}, Zn = SoZ. 
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Then clearly x, = y, +Z, and (6) follows by addition. To show that {Y,, 4} 
is a martingale, we may verify that &{y, | A,1} = 0 as indicated a moment 
ago, and this is trivial by Theorem 9.1.5. Since each z, > 0, it is equally 
obvious that {Z,,} is an increasing process. The theorem is proved. 


Observe that Z, € %,1 for each n, by definition. This has important 
consequences; see Exercise 9 below. The decomposition (6) will be called 
Doob’s decomposition. For a supermartingale we need only change the “+” 
there into “—”, since {—Y,,4,} is a martingale. The following complement 
is useful. 


Corollary. If {X,} is L'-bounded [or uniformly integrable], then both {Y,,} 
and {Z,,} are L'-bounded [or uniformly integrable]. 


PROOF. We have from (6): 
E(Zn) < E(Xn|) — €(Y1) 


since &(Y,) = &(Y1). Since Z, > 0 this shows that if {X,,} is L'-bounded, 
then so is {Z,}; and {Y,} is too because 


EU nl) S EUXnl) + &(Zn). 


Next if {X,,} is uniformly integrable, then it is L'-bounded by Theorem 4.5.3, 
hence {Z,} is L'-bounded and therefore uniformly integrable as remarked 
before. The uniform integrability of {Y,,} then follows from the last-written 
inequality. 


We come now to the fundamental notion of optional sampling of a 
smartingale. This consists in substituting certain random variables for the orig- 
inal index n regarded as the time parameter of the process. Although this kind 
of thing has been done in Chapter 8, we will reintroduce it here in a slightly 
different way for the convenience of the reader. To begin with we adjoin a last 
index oo to the set N and call it No = {1,2,..., co}. This is an example of 
a linearly ordered set mentioned above. Next, adjoin A = V4 A, to {F}. 

Ar.v. a taking values in VN. is called optional (relative to {%,,n € Noo}) 
iff for every n € Ng we have 


(8) fa<n}e%. 


Since %, increases with n, the condition in (8) is unchanged if {a <n} is 
replaced by {a =n}. Next, for an optional a, the pre-a field &% is defined 
to be the class of all subsets A of AQ satisfying the following condition: for 
each n € Ng we have 


(9) AN{a<n}e%, 
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where again {a < n} may be replaced by {a = n}. Writing then 
(10) An = AN {a =n}, 


we have A, € &, and 


A=UAn = Ul =n) 0 Ag] 


where the index n ranges over No. This is (3) of Sec. 8.2. The reader should 
now do Exercises 1—4 in Sec. 8.2 to get acquainted with the simplest proper- 
ties of optionality. Here are some of them which will be needed soon: % is 
a B.F. anda € &; if a is optional then so is aAn for each n € N; ifa < B 
where £ is also optional then A C %; in particular Apqn CA OS, and in 
fact this inclusion is an equation. 

Next we assume X,. has been defined and Xo € Ap. We then define X, 
as follows: 


(11) Xq(@) = Xow) (@); 
in other words, 
Xg(@) =X,(w) on fa=n}, nENg. 


This definition makes sense for any @ taking values in Noo, but for an optional 
a we can assert moreover that 


(12) Xq € A. 


This is an exercise the reader should not miss; observe that it is a natural 
but nontrivial extension of the assumption X, € 4% for every n. Indeed, all 
the general propositions concerning optional sampling aim at the same thing, 
namely to make optional times behave like constant times, or again to enable 
us to substitute optional r.v.’s for constants. For this purpose conditions must 
sometimes be imposed either on a or on the smartingale {X,,}. Let us however 
begin with a perfect case which turns out to be also very important. 

We introduce a class of martingales as follows. For any integrable r.v. Y 
we put 


(13) Xn =E(Y|A), nENo. 
By Theorem 9.1.5, if n < m: 
(14) Xn = {EY | Fn) | Al = Xn | A} 


which shows {X,, 4} is a martingale, not only on N but also on Noo. The 
following properties extend both (13) and (14) to optional times. 
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Theorem 9.3.3. For any optional a, we have 
(15) Xo = E(Y | A). 


If a < B where B is also optional, then {X,.,%;X g,:%} forms a two-term 
martingale. 


proor. Let us first show that X, is integrable. It follows from (13) and 
Jensen’s inequality that 
IXnl < &UY| | A). 


Since {a = n} € 4%, we may apply this to get 


[xaar=> | Xalar <> | WlaP= f Wv|ae < ov. 
Q n v{a=n} n v{a=n} Q 


Next if A € %, we have, using the notation in (10): 


[Xedr=D | erar=¥ | vav= | var, 


where the second equation holds by (13) because A, € %,. This establishes 
(15). Now if a < B, then A C % and consequently by Theorem 9.1.5. 


Xq = {EY | 4g) | Aj = {Xp | A}, 
which proves the second assertion of the theorem. 


As an immediate corollary, if {a,,} is a sequence of optional r.v.’s such 
that 


(16) Q) <2 <--- Sa, <-::, 


then {Xq,, %,} iS a martingale. This new martingale is obtained by sampling 
the original one at the optional times {a;}. We now proceed to extend the 
second part of Theorem 9.3.3 to a supermartingale. There are two important 
cases which will be discussed separately. 


Theorem 9.3.4. Let a@ and 6 be two bounded optional r.v.’s such that 
a< . Then for any [super]martingale {X,,}, {Xu,4%;Xg,%} forms a 
{super}]martingale. 


PROOF. Let A € %; using (10) again we have for each k > j: 


A; M{B>kheR 
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because A; € ¥ C A, whereas {8 > k} = {B < k}° € &. It follows from the 
defining relation of a supermartingale that 


i X,dP > ‘i Xk+1 a? 
AjMB>k) AM B>k) 


and consequently 


7 Xap > | xd7+ | X41 dP 
Aj B2k} A {B=k} Aj MB>k} 


Rewriting this as 


i XpadP — Xinde> | Xp dP, 
Aj MB=K} Aj N{BEk+1} Aj M{B=k} 


summing over k from j to m, where m is an upper bound for £; and then 
replacing X; by X, on Aj, we obtain 


(17) / XydP — Xm4i dP = ih Xp df. 
A {B27} A, MNB2=mt+1)} 


AjMLISBSm) 
| xarz | Xp dP. 
Aj A 


j j 
Another summation over j from 1 to m yields the desired result. In the case 
of a martingale the inequalities above become equations. 


A particular case of a bounded optional r.v. is a, = aAn where a is 
an arbitrary optional r.v. and n is a positive integer. Applying the preceding 
theorem to the sequence {@,} as under Theorem 9.3.3, we have the following 
corollary. 


Corollary. If {X,,4,} is a [super]martingale and a is an arbitrary optional 
r.v., then {XeanAan} is a [super]martingale. 


In the next basic theorem we shall assume that the [super]martingale 
is given on the index set N.o. This is necessary when the optional r.v. 
can take the value +oo, as required in many applications; see the typical 
example in (5) of Sec. 8.2. It turns out that if {X,,} is originally given only 
for n € N, we may take Xo = limy+0X, to extend it to Noo under certain 
conditions, see Theorems 9.4.5 and 9.4.6 and Exercise 6 of Sec. 9.4. A trivial 
case occurs when {X,, 4,;n € N} is a positive supermartingale; we may then 
take X,. = 0. 


Theorem 9.3.5. Let a and £ be two arbitrary optional r.v.’s such that a < B. 
Then the conclusion of Theorem 9.3.4 holds true for any supermartingale 
{Xni Ain € Ne}. 
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Remark. For a martingale {X,,:-4,:n € Noo} this theorem is contained 
in Theorem 9.3.3 since we may take the Y in (13) to be Xo here. 


PROOF. (a) Suppose first that the supermartingale is positive with X. = 0 
a.e. The inequality (17) is true for every m € N, but now the second integral 
there is positive so that we have 


/ X,dP > / X,d?. 
A; Aj MBbsm} 


i 


Since the integrands are positive, the integrals exist and we may let m — oo 
and then sum over j € N. The result is 


i XgdP = i Xp dP 
ANM{a<oo} AN B<oo} 


which falls short of the goal. But we can add the inequality 


/ AGS = Awa? = Kaod7 = [ XpdP 
AN{a=co} AN{a=co} AN{B=co} AN{B=co} 


which is trivial because X.. = 0 a.e. This yields the desired 


(18) [ xade> | xnav 
A A 


Let us show that X, and Xg are in fact integrable. Since X, > Xoo we have 
Xe < lim, .~.Xaan SO that by Fatou’s lemma, 


(19) &(Xq) = lim E(Xaan): 
n> OO 
Since 1 and @An are two bounded optional r.v.’s satisfying 1 < aAn; the 
right-hand side of (19) does not exceed ¢(X,) by Theorem 9.3.4. This shows 
Xq is integrable since it is positive. 
(b) In the general case we put 
Ae kes Vl X= Xe Oke 


n 


Then {X'),, Ain € Noo} is a martingale of the kind introduced in (13), and 
X,, > X', by the defining property of supermartingale applied to X,, and Xp. 
Hence the difference {X”, %,;n € N} is a positive supermartingale with XQ, = 
0 ae. By Theorem 9.3.3, {Xi Fas Xp, Fp} forms a martingale; by case (a), 
D.C ee 4 p> 7} forms a supermartingale. Hence the conclusion of the theorem 


follows simply by addition. 


The two preceding theorems are the basic cases of Doob’s optional 
sampling theorem. They do not cover all cases of optional sampling (see e.g. 
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Exercise 11 of Sec. 8.2 and Exercise 11 below), but are adequate for many 
applications, some of which will be given later. 

Martingale theory has its intuitive background in gambling. If X,, is inter- 
preted as the gambler’s capital at time n, then the defining property postulates 
that his expected capital after one more game, played with the knowledge of 
the entire past and present, is exactly equal to his current capital. In other 
words, his expected gain is zero, and in this sense the game is said to be 
“fair”. Similarly a smartingale is a game consistently biased in one direc- 
tion. Now the gambler may opt to play the game only at certain preferred 
times, chosen with the benefit of past experience and present observation, 
but without clairvoyance into the future. [The inclusion of the present status 
in his knowledge seems to violate raw intuition, but examine the example 
below and Exercise 13.] He hopes of course to gain advantage by devising 
such a “system” but Doob’s theorem forestalls him, at least mathematically. 
We have already mentioned such an interpretation in Sec. 8.2 (see in partic- 
ular Exercise 11 of Sec. 8.2; note that w+ 1 rather than q@ is the optional 
time there.) The present generalization consists in replacing a stationary inde- 
pendent process by a smartingale. The classical problem of “gambler’s ruin” 
illustrates very well the ideas involved, as follows. 

Let {S,,n EN °} be a random walk in the notation of Chapter 8, and let S, 
have the Bernoullian distribution 531 + 58-1. It follows from Theorem 8.3.4, 
or the more elementary Exercise 15 of Sec. 9.2, that the walk will almost 
certainly leave the interval [—a, b], where a and b are strictly positive integers; 
and since it can move only one unit a time, it must reach either —a or b. This 
means that if we set 


(20) a=min{n > 1:5, =—a}, B=min{n > 1:S, =5}, 


then y=aAf is a finite optional r.v. It follows from the Corollary to 
Theorem 9.3.4 that {S,,,} is a martingale. Now 
(21) im. Syn = Sy a@.€. 
and clearly S, takes only the values —a and b. The question is: with what 
probabilities? In the gambling interpretation: if two gamblers play a fair coin- 
tossing game and possess, respectively, a and b units of the constant stake as 
initial capitals, what is the probability of ruin for each? 

The answer is immediate (“without any computation”!) if we show first 
that the two r.v.’s {S,, S,} form a martingale, for then 


(22) (Sy) = €(S1) = 0, 
which is to say that 


a (Sy = —a} + BAS, = b} = 0, 
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so that the probability of ruin is inversely proportional to the initial capital of 
the gambler, a most sensible solution. 

To show that the pair {5;, S,} forms a martingale we use Theorem 9.3.5 
since {Syan,n€Noo} is a bounded martingale. The more elementary 
Theorem 9.3.4 is inapplicable, since y is not bounded. However, there is 
a simpler way out in this case: (21) and the boundedness just mentioned 
imply that 

E(Sy) = Tim E(Syan); 


and since ¢(Sya1) = &(S1), (22) follows directly. 

The ruin problem belonged to the ancient history of probability theory, 
and can be solved by elementary methods based on difference equations (see, 
e.g., Uspensky, Introduction to mathematical probability, McGraw-Hill, New 
York, 1937). The approach sketched above, however, has all the main ingredi- 
ents of an elaborate modern theory. The little equation (22) is the prototype of 
a “harmonic equation”, and the problem itself is a “boundary-value problem”. 
The steps used in the solution—to wit: the introduction of a martingale, 
its optional stopping, its convergence to a limit, and the extension of the 
martingale property to include the limit with the consequent convergence of 
expectations —are all part of a standard procedure now ensconced in the 
general theory of Markov processes and the allied potential theory. 


EXERCISES 


1. The defining relation for a martingale may be generalized as follows. 
For each optional r.v. a <n, we have &{X, |A}=Xq. Similarly for a 
smartingale. 

*2. If X is an integrable r.v., then the collection of (equivalence classes 
of) r.v.’s &(X | &) with & ranging over all Borel subfields of ¥, is uniformly 
integrable. 

3. Suppose {X“), %,}, k = 1, 2, are two [super]martingales, a is a finite 
optional r.v., and X0? = [>]X®. Define X, = XO lip<oy +X 1 nsaj3 Show 
that {X,,, 4,} is a [super]martingale. [HinT: Verify the defining relation in (4) 
form=n+1.]J 

4. Suppose each X,, is integrable and 


E{Xng1 | X1,---. Xn} =n (Ky +--+ +Xn) 


then {(n~!)(X; +---+X,),n € N} is a martingale. 

5. Every sequence of integrable r.v.’s is the sum of a supermartingale 
and a submartingale. 

6. If {X,, 4%} and {X’,, %,} are martingales, then so is {X, +X), A}. 
But it may happen that {X,,} and {X‘,} are martingales while {X,, + X’,} is not. 
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[Hint: Let x, and x, be independent Bernoullian r.v.’s; and x. = x, = +1 or 
—1 according as x; + x} = 0 or ¥ 0; notation as in (7).] 

7. Find an example of a positive martingale which is not uniformly inte- 
grable. [HINT: You win 2” if it’s heads n times in a row, and you lose everything 
as soon as it’s tails.] 

8. Find an example of a martingale {X,} such that X,, —-> —oo a.e. This 
implies that even in a “fair” game one player may be bound to lose an 
arbitrarily large amount if he plays long enough (and no limit is set to the 
liability of the other player). [HinT: Try sums of independent but not identically 
distributed r.v.’s with mean 0.] 

*9, Prove that if {Y,,F%} is a martingale such that Y, € %,_ 1, then for 
every n, Y, = Y, a.e. Deduce from this result that Doob’s decomposition (6) 
is unique (up to equivalent r.v.’s) under the condition that Z, € %_, for every 
n > 2. If this condition is not imposed, find two different decompositions. 

10. If {X,,} is a uniformly integrable submartingale, then for any optional 
r.v. a we have 


(i) {Xuan} is a uniformly integrable submartingale; 
(it) &(%)) < &(Xa) S sup, &(Xn). 


[HINT: IXaan| s IX | + IXn|.J 
*11. Let {X,, Asn € N} be a [super]martingale satisfying the following 
condition: there exists a constant M such that for every n > 1: 


E{|Xn — Xn—-il|A-1} < Mae. 


where Xo = 0 and & is trivial. Then for any two optional r.v.’s a@ and B 
such that a < B and &(B) < &, {Xg, Ai Xp, Ys} is a [super]martingale. This 
is another case of optional sampling given by Doob, which includes Wald’s 
equation (Theorem 5.5.3) as a special case. [HINT: Dominate the integrand in 
the second integral in (17) by Yg where Xp = O and Y,, = re IX, —Xp—1|. 
We have 


[o-4) 
é%p)=)> [ Vip Kyl? SME), 


n=] 


12. Apply Exercise 11 to the gambler’s ruin problem discussed in the 
text and conclude that for the a in (20) we must have ¢(a@) = +00. Verify 
this by elementary computation. 

*13. In the gambler’s ruin problem take b = 1 in (20). Compute ¢(S Ban) 
for a fixed n and show that {So, Sgn} forms a martingale. Observe that {So, Sg} 
does not form a martingale and explain in gambling terms the effect of stopping 
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f at n. This example shows why in optional sampling the option may be taken 
even with the knowledge of the present moment under certain conditions. In 
the case here the present (namely f A n) may leave one no choice! 


14. In the gambler’s ruin problem, suppose that S, has the distribution 
pyt+(l— poi, p#5: 


and let d = 2p — 1. Show that &(S,) = d&(y). Compute the probabilities of 
ruin by using difference equations to deduce é(y), and vice versa. 
15. Prove that for any L'-bounded smartingale {X,,A,n € Noo}, and 
any optional a, we have &(|X,gl|) < oo. [HINT: Prove the result first for a 
martingale, then use Doob’s decomposition.] 
*16. Let {X,,4} be a martingale: x; = X1, x, =X, —Xn_) for n > 2; 
let vu, € A,-1 for n > 1 where A = H; now put 


n 


- = So vjxy. 


j=l 


Show that {7,,, 4%,} is a martingale provided that T,, is integrable for every n. 
The martingale may be replaced by a smartingale if v, > 0 for every n. As 
a particular case take vy, = 1{n<q} Where @ is an optional r.v. relative to {F,}. 
What then is 7,,? Hence deduce the Corollary to Theorem 9.3.4. 

17. As in the preceding exercise, deduce a new proof of Theorem 9.3.4 
by taking Un = lto<n<B}: 


9.4 Inequalities and convergence 


We begin with two inequalities, the first of which is a generalization of 
Kolmogorov’s inequality (Theorem 5.3.1). 


Theorem 9.4.1. If {X;,F, j € Nn} is a submartingale, then for each real A 
we have 


(1) AP{ max X; >A} < i Xn dP < E(X>); 
} {max)<j<nXj2A} 


Sysn 


(2) AP{ min X;< —ih} < &(X, — X)) -| X,dP 
1 {minj<j<n Xj<—A} 


<jsn 


[A 


E(X}) — &(X;). 


PROOF. Let @ be the first j such that X; > A if there is such a j in Np, 
otherwise let ~ =n (optional stopping at nm). It is clear that @ is optional; 
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since it takes only a finite number of values, Theorem 9.3.4 shows that the 
pair {X,,X,} forms a submartingale. If we write 


M = { max X; > A}, 


l<jsn 


then M € & (why?) and X_ >A on M, hence the first inequality follows 
from 


17M) < | XodP< | X, dP, 
M M 


the second is just a cruder consequence. 
Similarly let 6 be the first j such that X; < —d if there is such a j in 
N,, otherwise let 8 = n. Put also 


M; = { min X; < —A}. 
1<jsk 
Then {X;, Xg} is a submartingale by Theorem 9.3.4, and so 


£0) = Xp) = | 


{B<n—-1} 


x,aP+ | X,dP 


xaos f 
M' 


Ms _|Mn 


Cc 
nh 


< -APM,) + En) - / X, dP, 


n 


which reduces to (2). 


Corollary i. If {X,} is a martingale, then for each 1>0: 


1 1 
G)  PAlmax (Xj >A} <= i XnldP < -€((Xql), 
Syn 


{maxi <j<n |X j|=A} 
If in addition & (x) < oo for each n, then we have also 
1 
(4) P{ max |X j| = A} < 5 E(%)). 
I<j<n i? 
These are obtained by applying the theorem to the submartingales {|X,,|} 
and ad In case X, 1s the S, in Theorem 5.3.1, (4) is precisely the Kolmo- 
gorov inequality there. 


Corollary 2. Let 1 <m<n, Ay, € A, and M = {maxp<j<en Xj = A}, then 


AP{ Am AM) < / X, df. 
AmNOM 


This is proved just as (1) and will be needed later. 
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We now come to a new kind of inequality, which will be the tool for 
proving the main convergence theorem below. Given any sequence of r.v.’s 
{Xj}, for each sample point w, the convergence properties of the numerical 
sequence {X ;(w)} hinge on the oscillation of the finite segments {X ;(w), j € 
N,,} as n — oo. In particular the sequence will have a limit, finite or infinite, if 
and only if the number of its oscillations between any two [rational] numbers a 
and b is finite (depending on a, b and w). This is a standard type of argument 
used in measure and integration theory (cf. Exercise 10 of Sec. 4.2). The 
interesting thing is that for a smartingale, a sharp estimate of the expected 
number of oscillations is obtainable. 

Let a < b. The number v of “upcrossings” of the interval [a, b] by a 
numerical sequence {x;,...,X,} is defined as follows. Set 


ay =min{j:1 <j <1n,x; <a}, 
@2 = min{j: a; <jSn,x; > b}; 


if either a; or @2 is not defined because no such j exists, we define v = 0. In 
general, for k > 2 we set 


Otay] = MIN{ 7: y%-2 < jf SN, Xj < a}, 
Otog = Min{ J: a-1 < jf Sn, x; =D}; 


if any one of these is undefined, then all the subsequent ones will be undefined. 
Let a be the last defined one, with £ = 0 if a; is undefined, then v is defined 
to be [£/2]. Thus v is the actual number of successive times that the sequence 
crosses from < a to > b. Although the exact number is not essential, since a 
couple of crossings more or less would make no difference, we must adhere 
to a rigid way of counting in order to be accurate below. 


Theorem 9.4.2. Let {Xj,:4, 7 © Nn} be a submartingale and ~oo <a < 
b < co. Let vis] (@) denote the number of upcrossings of [a, b] by the sample 


sequence {X ;(w); 7 € N,,}. We have then 


{Xn — alt} — {X, — a)*} < EXT} + lal 
b-a ~ b-a - 


PROOF. Consider first the case where X; > 0 for every j and0 =a <b, 


so that Ving] (w) becomes V10.5] (w), and X_,(w) =0 if j is odd, where aj; = 


a j(w) is defined as above with x; = X ;(w). For each w, the sequence a ;(w) 
is defined only up to £(w), where 0 < £(w) <n. But now we modify the 
definition so that a;(w) is defined for 1 < j < n by setting it to be equal to n 
wherever it was previously undefined. Since for some w, a previously defined 
a ;(w) may also be equal to n, this apparent confusion will actually simplify 


(6) em j}< 
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the formulas below. In the same vein we set ao = 1. Observe that a,, =n in 
any case, so that 


n—-] 


Xn —X1 = Xa, — Xap = > (Kaj —Xa)= Do +>-. 


j=0 jeven jodd 
If j 1s odd and j + 1 < £(@), then 
Xq,..(@) > b>0 = Xq,(w); 
If j is odd and j = £(q), then 
Xqj,)(@) = Xn(@) = 0 = Xaq,(w); 
if j is odd and £(w) < j, then 
Xqj4)(@) = Xn(@) = Xa, (a). 


Hence in all cases we have 


(6) S> Kaye (@) — Xaj(@)) = S> Kaj: (@) — Xe, (0)) 
oe Hist 
l 
5 Ea b=), (wb. 


Next, observe that {a;,0 < j <n} as modified above is in general of the form 
l1=ag <a <Q) <--- <Q < gy] =--: =a, =n, and since constants 
are optional, this is an increasing sequence of optional r.v.’s. Hence by 
Theorem 9.3.4, {Xg,,0 < j <7} is a submartingale so that for each j,0 < 
j<n-—1, we have &{Xo,41 —Xqa,} = 0 and consequently 


EE) So (Kazu —Xaj) 2 = 0. 


j even 


Adding to this the expectations of the extreme terms in (6), we obtain 
(7) EX —X1) = Eg bs 


which is the particular case of (5) under consideration. 

In the general case we apply the case just proved to {(X; — a)*, 7 € Nn}, 
which is a submartingale by Corollary 1 to Theorem 9.3.1. It is clear that 
the number of upcrossings of [a,b] by the given submartingale is exactly 
that of [0, b — a] by the modified one. The inequality (7) becomes the first 
inequality in (5) after the substitutions, and the second one follows since 
(X, —a)t < X* + |a|. Theorem 9.4.2 is proved. 
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The corresponding result for a supermartingale will be given below; but 
after such a painstaking definition of upcrossing, we may leave the dual defi- 
nition of downcrossing to the reader. 


Theorem 9.4.3. Let {X;,4;;7 € N,} be a supermartingale and let —co < 


ab < oo. Let ae be the number of downcrossings of [a,b] by the sample 


sequence {X ;(w), 7 € N,}. We have then 
2 E{X, A bs} — E{Xy A b} 
8 € (n) € 
) é{v Via, b] }< b —a 
PROOF. {—Xj, j/ € Nn} is a submartingale and it”), is v{"), _,) for this 
submartingale. Hence the first part of (5) becomes 
EHP} < (Xn + byt —(-Xi +b} &-Xn)t -G6-X1)")} 
la, ~a — (—b) ~ b—a 
Since (b — x)? = b— (b Ax) this is the same as in (8). 


Corollary. For a positive supermartingale we have for0 <a<b< oo 


og~(n) 
E{Via a} S aay: 


G. Letta proved the sharper “dual”: 


é{y_) < 
Mao} S b-—a 


(Martingales et intégration stochastique, Quaderni, Pisa, 1984, 48-49.) 
The basic convergence theorem is an immediate consequence of the 
upcrossing inequality. 


Theorem 9.4.4. If {X,,4%3n €N} is an L'-bounded submartingale, then 
{X,,} converges a.e. to a finite limit. 


Remark. Since 
&(\Xn|) = 26 (XP) — E(X,) S 2EKT) — EX), 
the condition of L'-boundedness is equivalent to the apparently weaker one 
below: 
(9) sup é(X7) < 00. 


PROOF. Let Vjg.p) = limp Tae Our hypothesis implies that the last term 
in (5) is bounded in n; letting n — 00, we obtain ¢{vjq.4)} < oo for every a 
and b, and consequently v,,,) is finite with probability one. Hence, for each 
pair of rational numbers a < b, the set 


A{a.b] = {lim X,, <a<b< lim X,,} 
n n 
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is a null set; and so is the union over all such pairs. Since this union contains 
the set where lim, X, < lim, X,, the limit exists a.e. It must be finite a.e. by 
Fatou’s lemma applied to the sequence {X,,|. 


Corollary. Every uniformly bounded smartingale converges a.e. Every posi- 
tive supermartingale and every negative submartingale converge a.e. 


It may be instructive to sketch a direct proof of Theorem 9.4.4 which is 
done “by hand”, so to speak. This is the original proof given by Doob (1940) 
for a martingale. 

Suppose that the set Ajg,p) above has probability > 7>0. For each w in 
Ata,b}, the sequence {X,,(w), n € N} takes an infinite number of values < a 
and an infinite number of values > b. Let 1 = np <n, <... and put 


Azj-1={ min X;<a}, Aozj={ max X;> Dd}. 
n2j-251SN2j;-1 n2j-) <0SN9; 


Then for each & it is possible to choose the n;’s successively so that the differ- 
ences n; — nj-; for 1 <i < 2k are so large that “most” of Aja,s) is contained 


in (74, Ai, so that 
2k 
Pp {a ah > 71. 
i=] 


Fixing an n > nx and applying Corollary 2 to Theorem 9.4.1 to {—X;} as 
well as {X;}, we have 


| 
ap i > fy Kise aP = fi Xa? 
(1s) = fa tn tt fo, 


i=] i 
i=l isl 


2j 
bP a s) < h. Xn, dP = hb. X, aP, 
i=l A: (VA: 


i=l i=] 


where the equalities follow from the martingale property. Upon subtraction 
we obtain 


2j 2j-1 
(b — a)P a si —aP ( () was, <- fe Xn dP, 
i=l Ai AS; 
i=l 
and consequently, upon summing over 1 < j <k: 
k(b — a)n — |a| < &(UXn)). 


This is impossible if k is large enough, since {X,} is L'-bounded. 
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Once Theorem 9.4.4 has been proved for a martingale, we can extend it 
easily to a positive or uniformly integrable supermartingale by using Doob’s 
decomposition. Suppose {X,,} is a positive supermartingale and X, = Y, — Z, 
as in Theorem 9.3.2. Then 0 < Z, < Y, and consequently 


E(Zo) = lim &(Z,) < &(¥1); 
nO 
next we have 
En) = EXn) + EZ) S E(X1) + E(ZLoo). 


Hence {Y,,} is an L'-bounded martingale and so converges to a finite limit 
as n —> oo. Since Z, t Za < 00 a.e., the convergence of {X,,} follows. The 
case of a uniformly integrable supermartingale is just as easy by the corollary 
to Theorem 9.3.2. 

It is trivial that a positive submartingale need not converge, since the 
sequence {n} is such a one. The classical random walk {S,,} (coin-tossing 
game) is an example of a martingale that does not converge (why?). An 
interesting and not so trivial consequence is that both &(S+) and &(|S,,|) must 
diverge to +00! (Cf. Exercise 2 of Sec. 6.4.) Further examples are furnished 
by “stopped random walk”’. For the sake of concreteness, let us stay with the 
classical case and define y to be the first time the walk reaches +1. As in our 
previous discussion of the gambler’s-ruin problem, the modified random walk 
{Su}, where Ss. = S,an, 1s still a martingale, hence in particular we have for 
each n: 


E(Sn) == E(S1) = 


sap+ | S; dP = &(S;) =0. 
{y=} {y>1} 


As in (21) of Sec. 9.3 we have, writing Se Sy, = 1, 
limS, =Soo ae. 


since y < © a.e., but this convergence now also follows from Theorem 9.4.4, 
since St < 1. Observe, however, that 


E(Sy) =0 <1 = E(Soo). 


Next, we change the definition of y to be the first time (> 1) the walk “returns” 
to 0, as usual supposing Spo = 0. Then S,, = 0 and we have indeed &(S,,) = 
&é(So.). But for each n, 


| S,dP>0 = So dP, 
{Sn >0} {S, >0} 


so that the ‘extended sequence” [Si Seay pees Soo} is no longer a martin- 
gale. These diverse circumstances will be dealt with below. 
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Theorem 9.4.5. The three propositions below are equivalent for a sub- 
martingale {X,, 4,3; € N}: 


(a) it is a uniformly integrable sequence; 

(b) it converges in L!; 

(c) it converges a.e. to an integrable X., such that {X,,Ain € Noo} is 
a submartingale and &(X,,) converges to é(Xo). 


PROOF. (a) => (b): under (a) the condition in Theorem 9.4.4 is satisfied so 
that X,, > Xo a.e. This together with uniform integrability implies X, — Xoo 
in L! by Theorem 4.5.4 with r = 1. 

(b) = (c): under (b) let X, > Xoo in L!, then €(|X,|) + &(\Xoo|) < 
oo and so X, > Xq a.e. by Theorem 9.4.4. For each AE 4% andn <n’, 


we have 
| xdas | xan 
A A 


by the defining relation. The right member converges to [, Xoo. dF by L'- 
convergence and the resulting inequality shows that {X,,A%;n € Noo} is a 
submartingale. Since L'-convergence also implies convergence of expecta- 
tions, all three conditions in (c) are proved. 

(c) => (a); under (c), {X7, 43 € Noo} is a submartingale; hence we 
have for every A>0: 


(10) iy Xt dP < / xia, 
{Xt>A]} {Xt >a} 


which shows that {X*,n € N} is uniformly integrable. Since X* > XZ, ae., 
this implies €(X)— &(X3). Since by hypothesis ¢(X,)—> €&(Xo0), it follows 
that ¢(X,) > ¢(X3,). This and X, — Xj, a.e. imply that {X77} is uniformly 
integrable by Theorem 4.5.4 for r = 1. Hence so is {X,,}. 


Theorem 9.4.6. In the case of a martingale, propositions (a) and (b) above 
are equivalent to (c’) or (d) below: 


(c’) it converges a.e. to an integrable X,) such that {X,,A,;n € Noo} 18 
a martingale; 

(d) there exists an integrable r.v. Y such that X, = ¢(Y | 4,) for each 
neN. 


PROOF. (b) = (c’) as before; (c’) = (a) as before if we observe that 
&(X,) = &(Xo,) for every n in the present case, or more rapidly by consid- 
ering |X,,| instead of X* as below. (c’) => (d) is trivial, since we may take 
the Y in (d) to be the Xo in (c’). To prove (d) => (a), let n <n’, then by 
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Theorem 9.1.5: 
EXn | FA) = EY | A) |A) = EY |A)=Xn, 


hence {X,,4%,n €N;Y,F} is a martingale by definition. Consequently 
{IX,],A,n €N;|Y|, #4} is a submartingale, and we have for each A>0: 


ys lar < | IY | dP, 
{IXn|>A} {IX,|>A} 


1 


1 
A|Xn| > A} <S ~EUKn|) S oak, 


>» | 


which together imply (a). 


Corollary. Under (d), {X%,,A,n €N3 Xo, Koi Y, F} is a martingale, where 
Xoo iS given in (c’). 


Recall that we have introduced martingales of the form in (d) earlier in 
(13) in Sec. 9.3. Now we know this class coincides with the class of uniformly 
integrable martingales. 

We have already observed that the defining relation for a smartingale 
is meaningful on any linearly ordered (or even partially ordered) index set. 
The idea of extending the latter to a limit index is useful in applications to 
continuous-time stochastic processes, where, for example, a martingale may 
be defined on a dense set of real numbers in (¢),f.) and extended to fp. 
This corresponds to the case of extension from N to Nog. The dual extension 
corresponding to that to ft; will now be considered. Let —N denote the set of 
strictly negative integers in their natural order, let —oo precede every element 
in —N, and denote by —N the set {—oo} U (—N) in the prescribed order. 
If {%,,n € —N} is a decreasing (with decreasing n) sequence of Borel fields, 
their intersection (),,-_y‘% will be denoted by Fo. 

The convergence results for a submartingale on —N are simpler because 
the right side of the upcrossing inequality (5) involves the expectation of the 
r.v. with the largest index, which in this case is the fixed —1 rather than the 
previous varying n. Hence for mere convergence there is no need for an extra 
condition such as (9). 


Theorem 9.4.7. Let {X,,,n € —N} be a submartingale. Then 
(11) lim X, =X_o, Where —oo<X_~<@® ae. 
n->-o 


The following conditions are equivalent, and they are automatically satisfied 
in case of a martingale with “submartingale” replaced by “martingale” in (c): 
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(a) {X,,} is uniformly integrable; 

(b) X,, > X_-, in L!; 

(c) {X,,n € —Noo} is a submartingale; 
(d) limy— —oc + E(Xn) > OO. 


PROOF. Let Vib) be the number of upcrossings of [a, b] by the sequence 
{X_n,...x_,}. We have from Theorem 9.4.2: 


“ E(X7,) + Jal 
E(u) = a 


Letting n — oo and arguing as the proof of Theorem 9.4.4, we conclude (11) 
by observing that 


E(X*,,) < lim &(Xt,,) < (KT) < &. 
The proofs of (a) > (b) > (c) are entirely similar to those in Theorem 9.4.5. 


(c) = (d) is trivial, since —co < &(X_.) < &(X_,) for each n. It remains 
to prove (d) => (a). Letting C denote the limit in (d), we have for each A>0: 


(12) AP(IXn| > A} < E(Xnl) = 26(XT) — EX,) < 2E(KT,) —C < ow. 


It follows that /{|X,,| > A} converges to zero uniformly in n as A — ov. Since 


/ xpar s | Xt dP, 
{XT >A} (xt>a} 


this implies that {X*} is uniformly integrable. Next if n < m, then 


o> | X, dP = 6%) | X, dP 
(X,<-+} {X,2—A} 


E(Xq) - / Xn dP 
{Xn = —A} 


IV 


E(Xn —Xm)+ & (Xm) — / Xm dp 
{X,2—A} 


E(Xn ~ Xm) + / Xm df. 
{Xn<—A} 


By (d), we may choose —m so large that é(X, — Xm) > —e for any given e>0 
and for every n < m. Having fixed such an m, we may choose A so large that 


sup [ \Xm|dP<€ 
n {X,<—A} 
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by the remark after (12). It follows that {X;} is also uniformly integrable, and 
therefore (a) is proved. 


The next result will be stated for the index set N of all integers in their 
natural order: 


N ={...,—n,...,—2,—-1,0,1,2,...,”,...}. 


Let {4%} be increasing B.F.’s on N, namely: % C AH if n <m. We may 
“close” them at both ends by adjoining the B.F.’s below: 


Foxe = [\Fn, Foo =\f Fra 
Let {Y,,} be r.v.’s indexed by N. If the B.F.’s and r.v.’s are only given on N 
or —N, they can be trivially extended to N by putting %, = A,Y, = Y, for 


al n <0, or 4%, = #1, Y, = Y_, for all n > 0. The following convergence 
theorem is very useful. 


Theorem 9.4.8. Suppose that the Y,,’s are dominated by an integrable r.v.Z: 


(13) sup |¥n| < Z; 


and lim, Vp = Yoo or Yio aS Mm — ©& or —Oo. Then we have 


(14a) lim &{¥n | Aj = {Yoo | Ach; 
noo 
(14b) lim &{Y¥n | Fa} = EY 00 | Foo}. 
Na - OO 


In particular for a fixed integrable r.v. Y, we have 


(15a) lim &{Y | FH} = EY | Hl: 
ua OO 

(15b) lim &{¥ |F}= ALY | Zoo}. 
no- 


where the convergence holds also in L! in both cases. 


PROOF. We prove (15) first. Let X, = &{Y | 4}. For n EN, {Xn, A} 
is a martingale already introduced in (13) of Sec. 9.3; the same is true for 
n € —N. To prove (15a), we apply Theorem 9.4.6 to deduce (c’) there. It 
remains to identify the limit X.. with the right member of (15a). For each 


A €&%, we have 
[vars [xara | xoar 
A A A 
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Hence the equations hold also for A € AQ (why?), and this shows that Xp 
has the defining property of &(Y | A, since Xo € Ao. Similarly, the limit 
X_oo In (15b) exists by Theorem 9.4.7; to identify it, we have by (c) there, 
for each A € F~: 


| xan | x,ar= [ var. 
A A A 


This shows that X_.. is equal to the right member of (15b). 
We can now prove (14a). Put for m € N: 
Wi = sup |Y, —Yool; 


n>m 
then |W,| < 2Z and limy—oo Wm = 0 a.e. Applying (15a) to W,, we obtain 
lim &{|¥n — Yoo!|F} < lim &{(Wn | FA} = Wm | Fo}. 
noo noo 


As m —> o%, the last term above converges to zero by dominated convergence 
(see (vii) of Sec. 9.1). Hence the first term must be zero and this clearly 
implies (14a). The proof of (14b) is completely similar. 

Although the corollary below is a very special case we give it for histor- 
ical interest. It is called Paul Lévy’s zero-or-one law (1935) and includes 
Theorem 8.1.1 as a particular case. 


Corollary. If A € A, then 
(16) lim A(A|H)=1, ae. 
noo 


The reader is urged to ponder over the intuitive meaning of this result and 
judge for himself whether it is “obvious” or “incredible”. 


EXERCISES 


*1. Prove that for any smartingale, we have for each A>0: 


AAlsup |Xn| > 2} < 3 sup (1X nl). 


For a martingale or a positive or negative smartingale the constant 3 may be 
replaced by 1. 

2. Let {X,} be a positive supermartingale. Then for almost every a, 
X;(w) = 0 implies X,(@) = 0 for all n >k. [This is the analogue of a 
minimum principle in potential theory.] 

3. Generalize the upcrossing inequality for a submartingale {X,,, 4%} as 
follows: 


E{(Xn —a)* | A} — & — a)* 


EU | F< 
{Via 5} | i}< b—a 
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Similarly, generalize the downcrossing inequality for a positive supermartin- 
gale {X,,,%,} as follows: 
Xi; Ab 


cys) poe 


*4. As a sharpening of Theorems 9.4.2 and 9.4.3 we have, for a positive 
supermartingale {X,, 4,,n € N}: 


a 
pare &(X; Aa) sayk-! 
Putt, = by x FEA (2) 


b b 
pyp=(n E(X A b) a\k-l 
Ai) 2S ——5— (5) 


These inequalities are due to Dubins. Derive Theorems 9.3.6 and 9.3.7 from 
them. [HINT: 


bP (a2; <n} < / 


{a2;<n} 


Xon,dP < ; Xo, AP 


{a2 ;-) <n} 


< / Xan; IP < aP{a2;-1 < n} 
{a2;-)<n} 


since {a@2;-1 <n} € A,,_,-] 

*5. Every L'-bounded martingale is the difference of two positive L’- 
bounded martingales. This is due to Krickeberg. [HInT: Take one of them to 
be limp soo &{XT | A}. 

*6. A smartingale {X,,4% 3n € N} is said to be closable [on the right] 
iff there exists a r.v. Xg such that {X,,4%,;n € Noo} is a smartingale of the 
same kind. Prove that if so then we can always take Xoo = limy_,9.X,. This 
supplies a missing link in the literature. [HInT: For a supermartingale consider 
Xn = (Xo |A)+Y,, then {Y,,4,} 1s a positive supermartingale so we 
may apply the convergence theorems to both terms of the decomposition.] 

7. Prove a result for closability [on the left] which is similar to Exercise 6 
but for the index set —N. Give an example to show that in case of N we may 
have lim, ¢(X;,) 4 &(Xoo), whereas in case of —N closability implies 
Lithips ee O(X, =—<é (C.ane 

8. Let {X,,,.4,, 1 € N} be a submartingale and let @ be a finite optional 
r.v. Satisfying the conditions: (a) ¢ (|X|) < oo, and (b) 


{fa>n} 


N > OX 


Then {Xorn;-4eani" © Noo} is a submartingale. [HinT: for A € A», bound 
J, Xa-Xann) a7 below by interposing Xgrmn where n < m.] 
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9. Let {X,,.4,;3n € N} be a supermartingale satisfying the condition 
limyn—oo (Xn) > —oo. Then we have the representation X,, = X}, + X7’ where 
{X),,%} is a martingale and {X7’, 4} is a positive supermartingale such that 
limn+sooX” =O in L! as well as ae. This is the analogue of F. Riesz’s 
decomposition of a superharmonic function, X/, being the harmonic part and 
X;, the potential part. [HiInt: Use Doob’s decomposition X, = Y, — Z, and 
put X), = Y,— €(Loo | Fy ).) 

10. Let {X,,, 4} be a potential; namely a positive supermartingale such 
that lim, €(X,) = 0; and let X, = Y,—Z, be the Doob decomposition 
(cf. (6) of Sec. 9.3]. Show that 


Xn = E (Zo (e4r) = Ze 


*11. If {X,} is a martingale or positive submartingale such that 
sup, ¢(X2) < 00, then {X,} converges in L? as well as a.e. 

12. Let {&,,m €N} be a sequence of independent and identically 
distributed r.v.’s with zero mean and unit variance; and S, = 1 ae 
Then for any optional r.v. @ relative to {&,} such that &(./a) < 00, we 
have é(S,g|) < J2ECSa) and ¢(S,)=0. This is an extension of Wald’s 
equation due to Louis Gordon. [HINT: Truncate @ and put mn, = (S2//k) — 


(S2_,/Wk — 1); then 
Asia= Sf mars Max b/Vk = 2a 
k=l Y (@24} k=l 


now use Schwarz’s inequality followed by Fatou’s lemma.] 

The next two problems are meant to give an idea of the passage from 
discrete parameter martingale theory to the continuous parameter theory. 

13. Let {X,,4;t € [0, 1]} be a continuous parameter supermartingale. 
For each t € [0, 1] and sequence {r,,} decreasing to ¢, {X,,} converges a.e. and 
in L!. For each t € (0, 1] and sequence {f,,} increasing to , {X,, } converges a.e. 
but not necessarily in L!. {Hmr: In the second case consider X,, — &{X, | .4,}.] 

*14. In Exercise 13 let QO be the set of rational numbers in [0, 1]. For 

each ft € (0, 1) both limits below exist a.e.: 


limX,, limX;. 
sf sit 
seQ sed 


[HInT: Let {Q,,,n > 1} be finite subsets of Q such that Q,, ¢ Q; and apply the 
upcrossing inequality to {X,, 5 € Q,}, then let n — oo.] 
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9.5 Applications 


Although some of the major successes of martingale theory lie in the field of 
continuous-parameter stochastic processes, which cannot be discussed here, it 
has also made various important contributions within the scope of this work. 
We shall illustrate these below with a few that are related to our previous 
topics, and indicate some others among the exercises. 


(I) The notions of “at least once” and “infinitely often” 


These have been a recurring theme in Chapters 4, 5, 8, and 9 and play impor- 
tant roles in the theory of random walk and its generalization to Markov 
processes. Let {X,, € N°} be an arbitrary stochastic process; the notation 


for fields in Sec. 9.2 will be used. For each n consider the events: 
oO 


An = J{X; € Bj). 


j=n 


oo 
M=() A, = {X; € B; 1.0}, 
n=] 


where B,, are arbitrary Borel sets. 


Theorem 9.5.1. We have 
(1) dim P{An+1 | Aon} = IM ae., 


where Ao,n) may be replaced by &, or X,, if the process is Markovian. 


prooF. By Theorem 9.4.8, (14a), the limit is 
PAM | Ao,co)} = lu. 


The next result is a “principle of recurrence” which is useful in Markov 
processes; it is an extension of the idea in Theorem 9.2.3 (see also Exercises 15 
and 16 of Sec. 9.2). 


Theorem 9.5.2. Let {X,, € N°} be a Markov process and A,, B, Borel 
sets. Suppose that there exists 6>0 such that for every n, 


o-@) 
(2) P{ U [X; € Bj] | Xn} > 6 ae. on the set {X, € An}; 
j=ntl 


then we have 


(3) PX; € A; i.o.J\[X; € B; io]} = 0. 
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PROOF. Let A = {X; € A; i.o.} and use the notation A, and M above. 
We may ignore the null sets in (1) and (2). Then if w € A, our hypothesis 
implies that 
FP Noes | Xn}(@) = 5 1.0. 


In view of (1) this is possible only if @ € M. Thus A C M, which implies (3). 


The intuitive meaning of the preceding theorem has been given by 
Doeblin as follows: if the chance of a pedestrian’s getting run over is greater 
than 6 > 0 each time he crosses a certain street, then he will not be crossing 
it indefinitely (since he will be killed first)! Here {X, € A,} is the event of 
the nth crossing, {X, € B,} that of being run over at the nth crossing. 


(ll) Harmonic and superharmonic functions for a Markov process 


Let {X,, € N°} be a homogeneous Markov process as discussed in Sec. 9.2 
with the transition probability function P(., -). An extended-valued function 
f on #! is said to be harmonic (with respect to P) iff it is integrable with 
respect to the measure P(x, -) for each x and satisfies the following “harmonic 
equation”; 


(4) veer fe)= f Pedyfoo. 


rd 39 


It is superharmonic (with respect to P) iff the in (4) is replaced by “>”; 


in this case f may take the value +00. 
Lemma. If f is [superJharmonic, then {f(X,,),n € N°}, where Xo = Xo for 
some given xo in #!, is a [super]martingale. 


PROOF. We have, recalling (14) of Sec. 9.2, 


APG = f Pa0.dy)F (0) < 00 
Rl 


as is easily seen by iterating (4) and applying an extended form of Fubini’s 
theorem (see, e.g., Neveu [6]). Next we have, upon substituting X,, for x in (4): 


fn) = a P(Xjs dy) f(y) = ELF (Xn41) | Xn} a EL f (Xn41) | Fons 


where the second equation follows by Exercise 8 of Sec. 9.2 and the third by 
Markov property. This proves the lemma in the harmonic case; the other case 
is similar. (Why not also the “sub” case?) 


The most important example of a harmonic function is the g(-,B) of 
Exercise 10 of Sec. 9.2 for a given B; that of a superharmonic function is the 
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f(-, B) of Exercise 9 there. These assertions follow easily from their proba- 
bilistic meanings given in the cited exercises, but purely analytic verifications 
are also simple and instructive. Finally, if for some B we have 

lo @) 

(x) = S> P(x, B) < 00 

n=0 
for every x, then z(-) is superharmonic and is called the “potential” of the 
set B. 


Theorem 9.5.3. Suppose that the remote field of {X,,n € N°} is trivial. 
Then each bounded harmonic function is a constant a.e. with respect to each 
Ln, Where f4, is the p.m. of X,. 


PROOF. By Theorem 9.4.5, f(X,,) converges a.e. to Z such that 
{f (Xn), Aon} Z, A0,00)} 


is a martingale. Clearly Z belongs to the remote field and so is a constant c 
a.e. Since 
f(Kn) = {Z| A}, 


each f(X,) is the same constant c a.e. Mapped into #!, the last assertion 
becomes the conclusion of the theorem. 


(Iil} The supremum of a submartingale 


The first inequality in (1) of Sec. 9.4 is of a type familiar in ergodic theory 
and leads to the result below, which has been called the “dominated ergodic 
theorem” by Wiener. In the case where X,, is the sum of independent r.v.’s 
with mean 0, it is due to Marcinkiewicz and Zygmund. We write ||X|| p for 
the L?-norm of X: IX Il), = &(|X|?). 


Theorem 9.5.4. Let 1 < p < wand 1/p+1/q = 1. Suppose that {X,,,n € 
N} is a positive submartingale satisfying the condition 


(5) sup &{X?} < oO. 
Then sup, <yXn € L? and 
(6) [sup Xn|lp Sq sup ||Xnllp- 


PROOF. The condition (5) implies that {X,} is uniformly integrable 
(Exercise 8 of Sec. 4.5), hence by Theorem 9.4.5, X, > Xo a.e. and {X,,n € 
Noo} is a submartingale. Writing Y for sup X,, we have by an obvious 
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extension of the first equation in (1) of Sec. 9.4: 
(7) VA>O:AA{Y >A} < iE Koga? 
{Y=} 


Now it turns out that such an inequality for any two r.v.’s Y and X., implies 
the inequality ||Y ||, < q||Xool|p, from which (6) follows by Fatou’s lemma. 
This is shown by the calculation below, where G(A) = A{Y > A}. 


foe) OO 
eur) =- | AP dG(A) < [ pr? Ga)da 
0 0 


00 1 
< [ para} Fi | Kad dk 
0 A J{y>.} 
Y 
= / Xe | i pra dP 
Q 0 


= q | Xoo¥? dP < ||XoollpllY?~ "Ig 


= q|IXooll ple? )y'. 


Since we do not yet know &(Y?) < oo, it is necessary to replace Y first 
with Y Am, where m is a constant, and verify the truth of (7) after this 
replacement, before dividing through in the obvious way. We then let m t oo 
and conclude (6). 


The result above is false for p = 1 and is replaced by a more complicated 
one (Exercise 7 below). 


(IV) Convergence of sums of independent r.v.’s 


We return to Theorem 5.3.4 and complete the discussion there by showing 
that the convergence in distribution of the series }>, X, already implies its 
convergence a.e. This can also be proved by an analytic method based on 
estimation of ch.f.’s, but the martingale approach is more illuminating. 


Theorem 9.5.5. If {X,,n € N} is a sequence of independent r.v.’s such that 
ee DE X ; converges in distribution as n — oo, then S,, converges a.e. 


PROOF. Let f; be the ch.f. of Xj, so that 


n 
Pn = IF: 
j=! 
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is the ch.f. of S,,. By the convergence theorem of Sec. 6.3, @, converges 
everywhere to y, the ch.f. of the limit distribution of S,,. We shall need only 
this fact for |t] < to, where to is so small that y(t) 4 0 for |r| < to; then this 
is also true of ¢, for all sufficiently large n. For such values of n and a fixed 
t with [t| < fo, we define the complex-valued r.v. Z, as follows: 


eltSn 
(8) = . 
$n {t) 
Then each Z,, is integrable; indeed the sequence {Z,,} is uniformly bounded. 
We have for each n, if %, denotes the Borel field generated by S),..., Sp: 
el!Sn eltXnti 


A2n41 | Fn} = ef 


a} 


_ its «{ 1X 4-1 4} = el!Sn Frat) _ 
Pn (t) fn4i (t) Pn (t) fruit) (t) 


where the second equation follows from Theorem 9.1.3 and the third from 
independence. Thus {Z,,4%,} is a martingale, in the sense that its real and 
imaginary parts are both martingales. Since it is uniformly bounded, it follows 
from Theorem 9.4.4 that Z, converges a.e. This means, for each ¢ with |t| < fo, 
there is a set Q, with A(Q,) = 1 such that if w € Q,, then the sequence of 
complex numbers e!’*») /g, (t) converges and so also does e"™), But how 
does one deduce from this the convergence of S,,(w)? The argument below 
may seem unnecessarily tedious, but it is of a familiar and indispensable kind 
in certain parts of stochastic processes. 

Consider e'’S«) as a function of (¢, w) in the product space T x Q, where 
T =[-‘o, to], with the product measure m x 7, where m is the Lebesgue 
measure on 7. Since this function is measurable in (t,w) for each n, the 
set C of (t, w) for which lim, e!") exists is measurable with respect to 
mx #. Each section of C by a fixed ¢ has full measure 7(Q2,) = 1 as just 
shown, hence Fubini’s theorem asserts that almost every section of C by a 
fixed w must also have full measure m(7) = 2f9. This means that there exists 
an Q with PQ) = = 1, and for each w € Q there is a subset T, of T with 
mT) = m(T), such that if t € T,, then limy_, oe" exists. Now we are 
in a position to apply Exercise 17 of Sec. 6.4 to conclude the convergence of 
S,(w) for w € Q, thus finishing the proof of the theorem. 


Ont) fnoild) 


ns 


According to the preceding proof, due to Doob, the hypothesis of 
Theorem 9.5.5 may be further weakened to that the sequence of ch.f.’s of 
S, converges on a set of ¢ of strictly positive Lebesgue measure. In particular, 
if an infinite product I, f,, of ch.f.’s converges on such a set, then it converges 
everywhere. 


9.5 APPLICATIONS | 365 


(V) The strong law of large numbers 


Our next example is a new proof of the classical strong law of large numbers 
in the form of Theorem 5.4.2, (8). This basically different approach, which 
has more a measure-theoretic than an analytic flavor, is one of the striking 
successes of martingale theory. It was given by Doob (1949). 


Theorem 9.5.6. Let {S,,n €N} be a random walk (in the sense of 
Chapter 8) with ¢{|S;|} < oo. Then we have 


ProoF. Recall that S, = )7"_, Xj; and consider for 1 <k <n: 
(9) E{Xe | Sus Sntis +.) = E(XE | Gi}, 
where &, is the Borel field generated by {S;, 7 > n}. Thus 
RVG=\G 
neN 


as mn increases. By Theorem 9.4.8, (15b), the right side of (9) converges to 
é{X; |G}. Now , is also generated by S, and {X;, 7 =>n+ 1}, hence it 
follows from the independence of the latter from the pair (X;, S,) and (3) of 
Sec. 9.2 that we have 


(10) E{X, | Ons Sets Anas a } = E{X x | Sn} = E{X | Sith 


the second equation by reason of symmetry (proof?). Summing over k from 
1 to n and taking the average, we infer that 


S 
—" = &(X1 |G) 
so that if Y_, =S,/n forn € N,{Y,,n € —N} is a martingale. In particular, 
: Sn ; e ; _ ¢ ye 
lim — = lim E{X | Gp} = EX, | G}, 
noo WH n-+0O 


where the second equation follows from the argument above. On the other 
hand, the first limit is a remote (even invariant) r.v. in the sense of Sec. 8.1, 
since for every m > 1 we have 


lim —— = lim —————-; 


noo n nC nh 
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hence it must be a constant a.e. by Theorem 8.1.2. [Alternatively we may use 
Theorem 8.1.4, the limit above being even more obviously a permutable r.v.] 
This constant must be &{¢{X; | G}} = {X)}, proving the theorem. 


(VI) Exchangeable events 


The method used in the preceding example can also be applied to the theory 
of exchangeable events. The events {E,, € N} are said to be exchangeable 
iff for every k > 1, the probability of the joint occurrence of any k of them is 
the same for every choice of k members from the sequence, namely we have 


(11) P{En, ++: NEn} = We, KEN; 
for any subset {7,,...,,} of N. Let us denote the indicator of E, by ep, 
and put 


n 
Nn = doe; 
j=l 


then N,, is the number of occurrences among the first n events of the sequence. 
Denote by &, the B.F. generated by {N;, j > n}, and 


G= f\ GH. 
neN 


Then the definition of exchangeability implies that if nj <n forl1<j< 
k, then 


k k 
(12) é [[en, 1 & =é Ten, Nn 
j=l j=l 
and that this conditional expectation is the same for any subset (1, ..., 1x) 
of (1,...,”). Put then f,9 = 1 and 
k 


n 


where the sum is extended over all () choices; this is the “elementary 
symmetric function” of degree k formed by ¢1,...,€n. Introducing an inde- 
terminate z we have the formal identity in z: 


So fnjz’ = |] + ez). 
j=l 


j=0 
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But it is trivial that 1 + e;z = (1+ z)* since e; takes only the values 0 and 
1, hence* 


n n 

Yo fnje! = [a +2)% =(1+z)%". 
j=0 j=l 

From this we obtain by comparing the coefficients: 


(13) fx = (4). O<k<n. 


It follows that the right member of (12) is equal to 


Nn n 

k k}- 
Letting n — oo and using Theorem 9.4.8 (15b) in the left member of (12), 
we conclude that 


k N k 
@ - _ - n 
(14) é Ife, |} = lim, (=) 
jJ= 


This is the key to the theory. It shows first that the limit below exists almost 
surely: 


and clearly n is a r.v. satisfying 0 < 7 < 1. Going back to (14) we have 
established the formula 


(14’) P(En, V++-OEn, |A=n, KEN; 
and taking expectations we have identified the w, in (11): 
we = &(n*). 
Thus {w,, k € N} is the sequence of moments of the distribution of n. This is 
de Finetti’s theorem, as proved by D. G. Kendall. We leave some easy conse- 


quences as exercises below. An interesting classical example of exchangeable 
events is Pélya’s urn scheme, see Rényi [24], and Chung [25]. 


(VH) Squared variation 
Here is a small sample of the latest developments in martingale theory. Let 
X = {X,, J} be a martingale; using the notation in (7) of Sec. 9.3, we put 


QO”, = O7(X) = Dy 


j=l 


*T owe this derivation to David Klarner. 
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The sequence {Q? (X),n € N} associated with X is called its squared varia- 
tion process and is useful in many applications. We begin with the algebraic 
identity: 


2 


(15) xe — Sox; = a ieee 


If X,, € L? for every n, then all terms above are integrable, and we have for 
each /: 
(16) E(Xj-14j) = €(Xj-1E@; | F-1)) = 0. 
It follows that : 
EX) = S- EUx7) = &(Q2). 


j= 


When X,, is the nth partial sum of a sequence of independent r.v.’s with zero 
mean and finite variance, the preceding formula reduces to the additivity of 
variances; see (6) of Sec. 5.1. 

Now suppose that {X,,} is a positive bounded supermartingale such that 
0 <X, <> for all n, where 4 is a constant. Then the quantity of (16) is 
negative and bounded below by 


E(AE(X; | F-1)) = 2E(x;) < 0. 
In this case we obtain from (15): 
KE(Xn) = AKA) > E(B) AAWAS- ay) = E(Q) + WLE(K,) = E(X1)I; 
j=2 
so that 
(17) E(Q?) < 20E(Xq) < 2a’. 


If X is a positive martingale, then X AA is a supermartingale of the kind 
just considered so that (17) is applicable to it. Letting X* = supy<, 29. Xn; 


PK Q(X) = AY S PAX" > AYA P{X™ S15 Qn (KAA) = A}. 


By Theorem 9.4.1, the first term on the right is bounded by A-!&(X,). The 
second term may be estimated by Chebyshev’s inequality followed by (17) 
applied to XAA: 


| “ 2 2 & 
A(OUXAA))} < F¢K). 


A Qn(XAA) > AVS 
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We have therefore established the inequality: 


(18) P{On(X) 2A} S : 


for any positive martingale. Letting n — oo and then A ~ oo, we obtain 


E(X1) 


[e.¢) 
Sox = lim Q? (xX) <0 ae. 
nO 
j=l 
Using Krickeberg’s decomposition (Exercise 5 of Sec. 9.4) the last result 

extends at once to any L!-bounded martingale. This was first proved by D. G. 
Austin. Similarly, the inequality (18) extends to any L!-bounded martingale 
as follows: 


6 
(19) P{On(X) 2a} s 4 Sup & (Xn). 


The details are left as an exercise. This result is due to D. Burkholder. The 
simplified proofs given above are due to A. Garsia. 


(VII) Derivation 


Our final example is a feedback to the beginning of this chapter, namely to 
use martingale theory to obtain a Radon—Nikodym derivative. Let Y be an 
integrable r.v. and consider, as in the proof of Theorem 9.1.1, the countably 
additive set function v below: 


(20) v(A) =| Yd?. 
A 


For any countable measurable partition (a, j €N} of Q, let %, be the Borel 
field generated by it. Define the approximating function X,, as follows: 


(1) 
v(A*’) 
(21) Xn = -——_ -] (), 


J a. 
where the fraction is taken to be zero if the denominator vanishes. According 
to the discussion of Sec. 9.1, we have 


Xn = E{Y | A}. 


Now suppose that the partitions become finer as n increases so that {%,,n € N} 
is an increasing sequence of Borel fields. Then we obtain by Theorem 9.4.8: 


lim X, = &{Y | A}. 


no 
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In particular if Y € A, we have obtained the Radon—Nikodym derivative 
Y = dv/d? as the almost everywhere limit of “increment ratios” over a “net”: 


A” 
Y( w) = (A wei) 
im, PL AS uw) 
where j(w) is the unique j such that w € A”, 

If (Q,4,P) is (W,B,m) as in Example 2 of Sec. 3.1, and v is an 
arbitrary measure which is absolutely continuous with respect to the ne Lebesgue 
measure m, we may take the nth partition to be 0=é <---<é™ =1 
such that 
max Gh — 9) 0. 

O<k<n— 

For in this case AQ will contain each open interval and so also &%. If v is not 
absolutely continuous, the procedure above will lead to the derivative of its 
absolutely continuous part (see Exercise 14 below). In particular, if F is the 
d.f. associated with the p.m. v, and we put 


k+1 k k k+1 
n(t)=2" |F —~F{— — < 
Fn) =2 ( an ) (=) for in <x< an 


where k ranges over all integers, then we have 


Jim fn) =F) 


for almost all x with respect to m; and F’ is the density of the absolutely 
continuous part of F; see Theorem 1.3.1. So we have come around to the 
beginning of this course, and the book is hereby ended. 


EXERCISES 


*1. Suppose that {X,, € N} is a sequence of integer-valued r.v.’s having 
the following property. For each n, there exists a function p, of n integers 
such that for every k € N, we have 


PAX ey j =X); 1 <J <n} = py, ..-, Xn). 


Define for a fixed xo: 


_ Pn41%0, X1,---,Xn) 
" Pn(X1,---,Xn) 


if the denominator >0; otherwise Z, = 0. Then {Z,,n € N} is a martingale 
that converges a.e. and in L'. [This is from information theory.] 
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2. Suppose that for each n, {X;,1 < j <n} and {X’,, 1 < j <n} have 
respectively the n-dimensional probability density functions p, and g,. Define 


Y, = GQn(X1, 110 Xn) 
. Pn(X1,.--,Xn) 


if the denominator >0 and = 0 otherwise. Then {Y,,n € N} is a super- 
martingale that converges a.e. [This is from statistics.] 

3. Let {Zn, € N°} be positive integer-valued r.v.’s such that Zp = 1 
and for each n > 1, the conditional distribution of Z,, given Zo,..., Zn—1 iS 
that of Z,_; independent r.v.’s with the common distribution {p,;,k € N°}, 
where p; < 1 and 


oe) 
O<m=) ky < OO. 
k=0 


Then {W,, € N°}, where W, = Z,,/m", is a martingale that converges, the 
limit being zero if m < 1. [This is from branching process.] 

4. Let {X,,n € N} be an arbitrary stochastic process and let % be as in 
Sec. 8.1. Prove that the remote field is almost trivial if and only if for each 
A € Ap we have 


lim sup |A(AM) — A(A)A(M)| = 0. 
n->OO Me#' 


(HrnT: Consider A(A | %) and apply 9.4.8. This is due to Blackwell and 
Freedman. } 
*5, In the notation of Theorem 9.6.2, suppose that there exists 5>0 such 
that 
PX; € Byji.o.|Xn}<1—6 ae. on {X, € An}; 


then we have 
P{X; €A; io. and X;€B;i0.} =0. 


6. Let f be a real bounded continuous function on A and w a p.m. on 
A such that 


Vx ER: f(x) = | f(x + y)u(dy). 
RR 


Then f(x +s) = f(x) foreach s in the support of y. In particular, if 4 is not of 
the lattice type, then f is constant everywhere. [This is due to G. A. Hunt, who 
used it to prove renewal limit theorems. The approach was later rediscovered 
by other authors, and the above result in somewhat more general context is 
now referred to as Choquet—Deny’s theorem. ] 
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*7, The analogue of Theorem 9.5.4 for p = 1 is as follows: if X, > 0 for 
all n, then 
(sup Xn} < ——[] + sup ¢{Xy log* Xp], 
where log* x = logx if x > 1 and 0 if x < 1. 
*8. As an example of a more sophisticated application of the martingale 
convergence theorem, consider the following result due to Paul Lévy. Let 
{X,,,n € N°} be a sequence of uniformly bounded r.v.’s, then the two series 


DdoXn and STE Xn |X... Xn) 
converge or diverge together. [HINT: Let 


Pn =Xn — &{Xq |X1,...,Xn-1} and Z, = YY). 
j=l 


Define « to be the first time that Z, > A and show that &(Z7,,) is bounded 


aAn 


inn. Apply Theorem 9.4.4 to {Zyan} for each A to show that Z, converges on 
the set where lim, Z, < 00; similarly also on the set where lim, Z, > —oo. 
The situation is reminiscent of Theorem 8.2.5.] 

9. Let {Y;, 1 < k <n} be independent r.v.’s with mean zero and finite 
variances 7; 


Se= SOY; SE=So07>0, Ze =Si- sf 


Prove that {Z;,.1 <k <n} is a martingale. Suppose now all Y, are bounded 
by a constant A, and define a and M as in the proof of Theorem 9.4.1, with 
the X, there replaced by the S, here. Prove that 


SPM) < &(S2) = £(S2) < AFA). 


Thus we obtain 


{max [Spy] <A} S 


an improvement on Exercise 3 of Sec. 5.3. [This is communicated by Doob.] 

10. Let {X,,n © N} be a sequence of independent, identically distributed 
rv.’s with &(|X;|) < 00; and Sy = )0"_, Xj. Define a = inf{n > 1:[X,| > 
nj}. Prove that if &((|Sq|/@)lta<oo}) < 00, then &(|X;|log* |X1}) < co. This 
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is due to McCabe and Shepp. [HINT: 


cn = |] PUX;| < J} > c>0; 


j=l 


“1 SC 
~~ / Xd = yo | Xia? 
n=l 7 Jfazn} n=] n {| 


X | |>n} 
“1 
> / IS,-1]|dP < 00.] 
Nn J{a=n} 


n=] 


11. Deduce from Exercise 10 that &(sup, |S,{/n) < oo if and only 
if &(|X,|logt |X;|) < oo. [mint: Apply Exercise 7 to the martingale 
{...,S5,/n,..., 52/2, S;} in Example (V).] 

12. In Example (VI) show that (i) & is generated by n; (ii) the events 
{E,,n € N} are conditionally independent given 7; (iii) for any / events En,» 
1 <7 </ and any k </ we have 


1 
P{En N+ OEn, NES 0+ NESS = | (;.) x*(1 —x)'*G(dx) 
0 


where G is the distributions of 7. 
13. Prove the inequality (19). 
*14. Prove that if v is a measure on A, that is singular with respect to 
f, then the X,,’s in (21) converge a.e. to zero. [HINT: Show that 


WiAy= | Xa for AEA, m <n, 
A 


and apply Fatou’s lemma. {X,,} is a supermartingale, not necessarily a martin- 
gale!] 

15. In the case of (#7, #, m), suppose that v = 6; and the nth partition 
is obtained by dividing 7 into 2” equal parts: what are the X,,’s in (21)? Use 
this to “explain away” the St. Peterburg paradox (see Exercise 5 of Sec. 5.2). 


Bibliographical Note 


Most of the results can be found in Chapter 7 of Doob [17]. Another useful account is 
given by Meyer [20]. For an early but stimulating account of the connections between 
random walks and partial differential equations, see 


A. Khintchine, Asymptotische Gesetze der Wahrscheinlichkeitsrechnung. Springer- 
Verlag, Berlin, 1933. 
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Theorems 9.5.1 and 9.5.2 are contained in 


Kai Lai Chung, The general theory of Markov processes according to Doeblin, 
Z. Wahrscheinlichkeitstheorie 2 (1964), 230-254. 


For Theorem 9.5.4 in the case of an independent process, see 


J. Marcinkiewicz and A. Zygmund, Sur les fonctions independants, Fund. Math. 
29 (1937), 60-90, 


which contains other interesting and useful relics. 
The following article serves as a guide to the recent literature on martingale 
inequalities and their uses: 


D. L. Burkholder, Distribution function inequalities for martingales, Ann. Proba- 
bility 1 (1973), 19-42. 


Supplement: Measure and 
Integral 


For basic mathematical vocabulary and notation the reader is referred to §1.1 
and §2.1 of the main text. 


1 Construction of measure 


Let 2 be an abstract space and / its total Borel field, then A € ./ means 
Ac Q. 


DEFINITION 1. A function u* with domain .” and range in [0, oo] is an 
outer measure iff the following properties hold: 

(a) u*(p) =0; 

(b) (monotonicity) if Ay C Ao, then w*(A)) < u*(A2); 

(c) (subadditivity) if {A ;} is a countable sequence of sets in .”, then 


ue JA; < NS u*(Aj). 
J j 


DEFINITION 2. Let “# be a field in Q. A function jz with domain 4 and 
range in [0, co] is a measure on “ iff (a) and the following property hold: 
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(d) (additivity) if {B;} is a countable sequence of disjoint sets in @ and 
U8; € *#, then 


(1) w{J8;) =S u@)). 
J J 


Let us show that the properties (b) and (c) for outer measure hold for a measure 
LL, provided all the sets involved belong to #. 

If Ay € A, Az € A, and A; C Az, then ASA2 € A because * is a field; 
A2 = A; UASA> and so by (d): 


(Az) = (Ay) + W(ASA2) > “(A1). 


Next, if each A; € “*, and furthermore if UJ jAj €“ (this must be assumed 
for a countably infinite union because it is not implied by the definition of a 
field!), then 

JA; = 41 UASA2 UACASA3 UU... 

j 


and so by (d), since each member of the disjoint union above belongs to %: 


JA; | = 441) + wASA2) + W(AGASA3) + 


< (A) + W(A2) + w(A3) +--- 


by property (b) just proved. 

The symbol N denotes the sequence of natural numbers (strictly positive 
integers); when used as index set, it will frequently be omitted as understood. 
For instance, the index 7 used above ranges over N or a finite segment of NV. 

Now let us suppose that the field “ is a Borel field to be denoted by ¥ 
and that is a measure on it. Then if A, € 4 for each n € N, the countable 
union (J,, A, and countable intersection (),, A, both belong to #. In this case 
we have the following fundamental properties. 

(e) (increasing limit) if A, C An4, for all n and A, tA = U,, An, then 


lim t “(An) = BA). 


(f) (decreasing limit) if A, D Ans for all n, A, JA =f), An, and for 
some n we have (A, ) < 0, then 


lim | #(An) = “(A). 
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The additional assumption in (f) is essential. For a counterexample let 
An = (n, ©) in R, then A, | ¢ the empty set, but the measure (length!) of A, 
is +00 for all n, while @ surely must have measure 0. See §3 for formalities 
of this trivial example. It can even be made discrete if we use the counting 
measure # of natural numbers: let A, = {n,n +1,n+2,...} so that #(A,) = 
+00, #(1),, An) = 0. 

Beginning with a measure yz on a field “, not a Borel field, we proceed 
to construct a measure on the Borel field Y generated by “%, namely the 
minimal Borel field containing % (see §2.1). This is called an extension of 
pe from “% to ¥, when the notation uw is maintained. Curiously, we do this 
by first constructing an outer measure j* on the total Borel field “ and then 
showing that sz* is in truth a measure on a certain Borel field to be denoted 
by “* that contains %. Then of course ¥* must contain the minimal ¥, and 
so y* restricted to Y is an extension of the original uw from “& to ¥. But we 
have obtained a further extension to #™* that is in general “larger” than Y and 
possesses a further desirable property to be discussed. 


DEFINITION 3. Given a measure pw on a field % in &2, we define u* on 
/ as follows, for any A € /: 


(2) (A) =inf { S > u(B,)|B; € H for all j and | JB; DA 
J f| 


A countable (possibly finite) collection of sets {B;} satisfying the conditions 
indicated in (2) will be referred to below as a “covering” of A. The infimum 
taken over all such coverings exists because the single set 22 constitutes a 
covering of A, so that 

0 < u*(A) < W*(Q) < +0. 


It is not trivial that w*(A) = (A) if A € “&, which is part of the next theorem. 


Theorem 1. We have “* = on *; * on / is an outer measure. 


PROOF. Let A € &, then the single set A serves as a covering of A; hence 
u*(A) < w(A). For any covering {B;} of A, we have AB; € @ and 


| JAB) =A EA; 
J 


hence by property (c) of 4 on “@ followed by property (b): 


J 


wiA) =n | JB; } < Sous, < Su). 
J j 
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It follows from (2) that w(A) < u*(A). Thus u* = uw on A. 


To prove 4* is an outer measure, the properties (a) and (b) are trivial. 
To prove (c), let € > 0. For each j, by the definition of u*(A;), there exists a 
covering {Bj} of A; such that 


HB) < u*(Aj)+ = 


The double sequence {Bj} is a covering of UJ jAj such that 
So So uBin) < Sout; +e. 
jk J 


Hence for any € > 0: 
wu |JA;} < SoutGs) +e 
j J 


that establishes (c) for *, since € is arbitrarily small. 
With the outer measure z*, a class of sets Y* is associated as follows. 


DEFINITION 4. A set A C Q belongs to ¥* iff for every Z C Q we have 
(3) wu (Z) = w*(AZ) + w*(A°Z). 
If in (3) we change “=” into “<”, the resulting inequality holds by (c); hence 


oe 99 


(3) 1s equivalent to the reverse inequality when “=” is changed into “>”. 


Theorem 2. *#* is a Borel field and contains &. On #*, yu* is a measure. 


proor. Let A € &. For any Z C Q and any ¢€ > 0, there exists a covering 
{Bj} of Z such that 


(4) > u(Bj) <u*(Z) +e. 
J 


Since AB; € A, {AB;} is a covering of AZ; {A°B ;} is a covering of A°Z; hence 


(5) w(AZ) < Y)MAB,),  w*(A'Z) <  MASB)). 
J J 


Since jz iS a Measure on “, we have for each /: 


(6) y(AB;) + W(A°B;) = (Bj). 
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It follows from (4), (5), and (6) that 
(AZ) + w*(A°Z) < w*(Z) +. 


Letting € | 0 establishes the criterion (3) in its “>” form. Thus A € #*, and 
we have proved that A c ¥*. 


To prove that * is a Borel field, it is trivial that it is closed under 
complementation because the criterion (3) is unaltered when A is changed 
into A°. Next, to show that ¥* is closed under union, let A € #* and B € ¥*. 
Then for any Z C Q, we have by (3) with A replaced by B and Z replaced by 
ZA or ZA‘: 


u*(ZA) = w*(ZAB) + pw" (ZAB‘); 
w*(ZAS) = pe" (ZASB) + we" (ZASB*). 
Hence by (3) again as written: 
ue" (Z) = w*(ZAB) + *(ZAB*) + w*(ZASB) + we" (ZASB‘). 
Applying (3) with Z replaced by Z(A UB), we have 
u*(Z(A UB)) = w*(Z(A U BJA) + L*(Z(A U BA‘) 
= pu" (ZA) + w*(ZA‘B) 
= u*(ZAB) + u*(ZAB‘) + *(ZA‘B). 
Comparing the two preceding equations, we see that 
we (Z) = w*(Z(AUB)) + w*(Z(AUBY). 


Hence AU B € #*, and we have proved that ¥* is a field. 
Now let {A;} be an infinite sequence of sets in ¥*; put 


j-1 
By) =A\, Bj; =Aj\ (U4) for j > 2. 


i=] 


Then {B;} is a sequence of disjoint sets in ¥* (because ¥* is a field) and has 
the same union as {A}. For any Z C &, we have for each n > 1: 


we 1zZ By) =e 1 Z| U8) | Bn | +e 12) U8 |B: 
j=l j=) j=l 


n—-1 
= u"(ZB,)+u" | Z\)B; 
j=l 
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because B,, € #*. It follows by induction on n that 


(7) ue} ZB; | =>ouZB)). 
j=l j=l 
Since Ua B; € #*, we have by (7) and the monotonicity of u*: 
c 
n n 
w(Z) =u" Z| By] +H |Z) UB; 
j=l 


j=l 


n oO 
= DBs) + a Z| (JB; 
i= j=l 


Letting n ¢ oo and using property (c) of u*, we obtain 


Cc 


CO oe) 
w(Z) =u (ZB | +H" 1Z{ UB; 
j=l j=l 


that establishes U>, B; ¢ F*. Thus ¥* is a Borel field. 
Finally, let {B;} be a sequence of disjoint sets in ¥*. By the property (b) 
of * and (7) with Z = Q, we have 


joe) n n lo e) 
wu { UB; = lim sup 1" J3; = lim $0 u*(;) = > w*B)). 
j=l j=l j=l j=l 


Combined with the property (c) of *, we obtain the countable additivity of 
* on #*, namely the property (d) for a measure: 


oO CO 
uw | Bs] = >) u*B)). 
j=l j=l 


The proof of Theorem 2 is complete. 


2 Characterization of extensions 
We have proved that 
MS Ge a ee, se 


where some of the “>” may turn out to be “=”. Since we have extended the 
measure zp from “# to #* in Theorem 2, what for 4? The answer will appear 
in the sequel. 
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The triple (2. 7, ~) where ¥ is a Borel field of subsets of Q, and yz is a 
measure on -¥, will be called a measure space. It is qualified by the adjective 
“finite” when ~(Q2) < oo, and by the noun “probability” when w(Q) = 1. 

A more general case is defined below. 


DEFINITION 5. A measure yu on a field “ (not necessarily Borel field) is 
said to be o-finite iff there exists a sequence of sets {Q2,, € N} in *# such 
that u(Q,,) < oo for each n, and LJ, 2, = Q. In this case the measure space 
(Q,.4%, u), where * is the minimal Borel field containing “, is said to be 
“o-finite on A”. 


Theorem 3. Let % be a field and ¥ the Borel field generated by *. Let 
and 42 be two measures on ¥ that agree on *. If one of them, hence both 
are o-finite on &, then they agree on ¥. 


PROOF. Let {22,,} be as in Definition 5. Define a class & of subsets of Q 
as follows: 


€ = {AC 2:1 (Q,A) = 2(2,A) for all n € N}. 


Since 2, € A, for any A € * we have Q,A € # for all n; hence € D A. 
Suppose A; € @, Ay C Axi for all k € N and A; t A. Then by property (e) 
of 4; and fz2 aS measures on ¥, we have for each n: 


4(Q,A) = lim Tt 1 (QpAg) = lim Pt 2 (QnAg) = 2(2pA). 


Thus Aeé@. Similarly by property (f), and the hypothesis )(Q,) = 
U2(Q,) < oo, if A; € © and A; | A, then A € &. Therefore @ is closed under 
both increasing and decreasing limits; hence © > ¥ by Theorem 2.1.2 of the 
main text. This implies for any A € ¥: 


My(A) = lim f 1 (nA) = lim t M2(QnA) = 142A) 


by property (e) once again. Thus jz; and jz2 agree on #. 

It follows from Theorem 3 that under the o-finite assumption there, the 
outer measure yz* in Theorem 2 restricted to the minimal Borel field # 
containing ~% is the unique extension of uw from “* to *%. What about the 
more extensive extension to #*? We are going to prove that it is also unique 
when a further property is imposed on the extension. We ‘begin by defining 
two classes of special sets in 7. 


DEFINITION 6. Given the field :% of sets in Q, let “gs be the collection 
of all sets of the form (\°-_, URL, Bmn where each Brn € A, and Aq be the 
collection of all sets of the form UP, p21 Bn where each Brn € A. 


382 | SUPPLEMENT: MEASURE AND INTEGRAL 


Both these collections belong to ¥ because the Borel field is closed under 
countable union and intersection, and these operations may be iterated, here 
twice only, for each collection. If B € 4, then B belongs to both “,3 and 
“Ajo because we can take B,,,. = B. Finally, A € Aps if and only if AS € As, 


because : 
(MU) =U Se 
mon mon 
Theorem 4. Let A € #*. There exists B € ,s3 such that 
ACB; pw*(A)=u*(B). 
If uw is o-finite on A, then there exists C € “s, such that 


CCA; w(C)=p"*(A). 


PROOF. For each m, there exists {B,,,} in Y such that 
1 
ACU Bm S/H*Brm) < uA) + —. 


Put 


Bi Bat Bes, Bas 
An m 
then A C B and B € A gs. We have 


1 
u*(B) & LL” (Bm) < Sa Ba) < p"(A) oF re 


A 


Letting m t+ oo we see that w*(B) < u*(A); hence u*(B) = *(A). The first 
assertion of the theorem is proved. 


To prove the second assertion, let Q, be as in Definition 5. Applying the 
first assertion to 92,,A°, we have B, € As such that 


2nAS C By W*(Q24yA) = LB). 
Hence we have 
OA CBs Ce QADHS Op: 
Taking complements with respect to Q,,, we have since *(Q,) < ©: 
OVA OB, 


n? 


(QnA) = ie = be” (2,A°) = M*(Qy) a Me (QrBr) = w(Q,B,, ): 
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Since Q, €:@ and Bo € Asg, it is easy to verify that 02,Br € Aa by the 
distributive law for the intersection with a union. Put 


C=(JQ,B%. 
It is trivial that C € #5, and 


A=(JQ,ADC. 


Consequently, we have 
u*(A) > w*(C) > liminf u*(Q,Bi) 
n 


= liminf u*(Q,A) = u* (A), 


the last equation owing to property (e) of the measure u*. Thus u*(A) = 
p*(C), and the assertion is proved. 

The measure .* on #* is constructed from the measure yu on the field 
“A. The restriction of u* to the minimal Borel field ¥Y containing # will 
henceforth be denoted by w instead of p*. 

In a general measure space (Q, &, v), let us denote by .4’(%, v) the class 
of all sets A in & with v(A) = 0. They are called the null sets when & and 
v are understood, or v-null sets when & is understood. Beware that if A C B 
and B is a null set, it does not follow that A is a null set because A may not 


be in &! This remark introduces the following definition. 


DEFINITION 7. The measure space (Q,%,v) is called complete iff any 
subset of a null set is a null set. 


Theorem 5. The following three collections of subsets of {2 are idential: 


(i) A C Q and the outer measure *(A) = 0; 
(ii) A € ¥* and u*(A) = 0; 
(iii) A C B where Be ¥ and w(B) = 0. 
It is the collection .4°(F"*, 2”). 


proor. If *(A) = 0, we will prove A € #* by verifying the criterion 
(3). For any Z C Q, we have by properties (a) and (b) of u*: 


O< (ZA) <u*(A)=0; w*(ZA°) <u); 
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and consequently by property (c): 
u*(Z) = w*(ZA UZA‘) < w*(ZA) + w"(ZA) < BZ). 
Thus (3) is satisfied and we have proved that (i) and (ii) are equivalent. 


Next, let A € #* and u*(A) = 0. Then we have by the first assertion in 
Theorem 4 that there exists B € ¥Y such that A C B and w*(A) = u(B). Thus 
A Satisfies (iii). Conversely, if A satisfies (iii), then by property (b) of outer 
measure: *(A) < u*(B) = uw(B) = 0, and so (i) is true. 

As consequence, any subset of a (¥*, jz*)-null set is a (#*, *)-null set. 
This is the first assertion in the next theorem. 


Theorem 6. The measure space (Q, ¥*, *) is complete. Let (Q, 4, v) be a 
complete measure space; § D @ and v = uw on #. If yu is o-finite on # then 


GD>F* andv=p"* on F¥". 


PROOF. Let A € #*, then by Theorem 4 there exists Be Y and Ce F 
such that 


(8) CCACB;, uw(C)=u*(A)= HB). 


Since v = w on “&, we have by Theorem 3, v = uw on ¥. Hence by (8) we 
have (B — C) = 0. Since A— C CB—C and B—Ce@, and (Q, &, v) is 
complete, we have A—Ceé% andsoA=CU(A—C)e®. 

Moreover, since C, A, and B belong to &%, it follows from (8) that 


u(C) = WC) < vA) S$ WB) = w(B) 
and consequently by (8) again v(A) = (A). The theorem is proved. 


To summarize the gist of Theorems 4 and 6, if the measure y on the field 
%#A is o-finite on A, then (F, yw) is its unique extension to ¥, and (#*, u*) 
is its minimal complete extension. Here one is tempted to change the notation 
pL to po on A! 

We will complete the picture by showing how to obtain (¥*, 4*) from 
(F , w), reversing the order of previous construction. Given the measure space 
(Q, F, ), let us denote by @ the collection of subsets of Q as follows: A € € 
iff there exists B € . 1 (F, ) such that A C B. Clearly © has the “hereditary” 
property: if A belongs to @, then all subsets of A belong to @; © is also closed 
under countable union. Next, we define the collection 


(9) F ={ACQ\|A=B-—C where BEF,CE €}. 


where the symbol “‘~—” denotes strict difference of sets, namely B — C= BC* 
where C C B. Finally we define a function ~@ on ¥ as follows, for the A 
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shown in (9): 
(10) H(A) = u(B). 


We will legitimize this definition and with the same stroke prove the mono- 
tonicity of Z. Suppose then 


(11) B,—C, CB.—-—C2, BE€F¥,C,€ €,i=1,2. 


Let C;} CD€.4 (FY, w). Then B; C B) UD and so p(B)) < w(B, UD) = 
(Bz). When the Cc in (11) is “=”, we can interchange B; and B2 to conclude 
that 4(B)) = w(B2), so that the definition (10) is legitimate. 


Theorem 7. F is a Borel field and 7 is a measure on F. 


PROOF. Let A, € ¥, n € N; so that A, = B, Cy, as in (9). We have then 


f= (fa) 9(Ue). 


Since the class ¢ is closed under countable union, this shows that F is closed 
under countable intersection. Next let CC D, DE W(, pw); then 


Ao =BSUC=BUBC=B U(BD—- (D-C))) 
= (BS UBD) — B(D-C). 

Since B(D — C) Cc D, we have B(D — C) € @; hence the above shows that G 
is also closed under complementation and therefore is a Borel field. Clearly 
4% DF because we may take C= @ in (9). __ 

__ To prove @ is countably additive on F, let {A,} be disjoint sets in 
FY. Then 

A, =B,—-Cn, Bn €F,Cny € €. 


There exists D in .{(¥%,) containing U%2,C,. Then {B, —D} are 
disjoint and 


fora) fore) fore) 
UG. -D)c An c U Bn. 
n=] n=! n=] 


All these sets belong to ¥ and so by the monotonicity of 7: 


z (Ue) <7(Us) = (Us) 


386 | SUPPLEMENT: MEASURE AND INTEGRAL 


Since = yu on &, the first and third members above are equal to, respec- 
tively: 


u (Ue. -p) = > u(Bn — D) = S> u(Bn) = Y > HAn): 
Mu (Us,] < 0H Bn) =D) BGn). 


Therefore we have 
i (Uss] = UGA). 
n n 
Since Z(¢) = A(¢ — ¢) = u(d) = 0, Z is a measure on ¥. 


Corollary. In truth: F = 7* and @ = p’*. 


PROOF. For any A € ¥%, by the first part of Theorem 4, there exists B € 
FY such that 
A=B-(B-A), w*(B-—A)=0. 


Hence by Theorem 5,B —A € & and soA€ F by (9). Thus ¥* CF, Since 
F CF* and € € ¥* by Theorem 6, we have ¥ C ¥* by (9). Hence ¥ = 
#*. It follows from the above that u*(A) = w(B) = “(A). Hence w* = @ on 


R* —~G 


The question arises naturally whether we can extend uw from “# to F 
directly without the intervention of #*. This is indeed possible by a method of 
transfinite induction originally conceived by Borel; see an article by LeBlanc 
and G.E. Fox: “On the extension of measure by the method of Borel”, Cana- 
dian Journal of Mathematics, 1956, pp. 516-523. It is technically lengthier 
than the method of outer measure expounded here. 

Although the case of a countable space {2 can be treated in an obvious 
way, it is instructive to apply the general theory to see what happens. 

Let 2 = N Uw; & is the minimal field (not Borel field) containing each 
singleton n in N, but not w. Let Ny denote the collection of all finite subsets 
of N; then @ consists of Nr and the complements of members of Ny (with 
respect to (2), the latter all containing w. Let 0 < u(n) < 9 foralln EN; a 
measure yp iS defined on “ as follows: 


H(A)= ou) ifA Ny; WAS) = W(Q) ~ WA). 
néA 


We must still define 4(2). Observe that by the properties of a measure, we 
have u(2) > doen U(n) = S, say. 
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Now we use Definition 3 to determine the outer measure ju”. It is easy 
to see that for any A C N, we have 


w*(A) = Do un). 
neA 


In particular w*(N) = s. Next we have 


w*(w) = inf w(AS) = w(Q) — sup w(A) = w(Q) —s 
AEN AEN ¢ 


provided s < oo; otherwise the inf above is oo. Thus we have 
LL (@) = w(Q) — 8 if u(Q) < o; u*(w) = 00 if W(Q) = oo. 
It follows that for any A C Q: 


w(A)= Dun) ! 


néA 


where *(n) = w(n) forn € N. Thus y* is a measure on “, namely, ¥* = /. 

But it is obvious that ¥ = / since ¥ contains N as countable union and 
so contains w as complement. Hence ¥ = ¥* = Y. 

If 4(&2) = 06 and s = oo, the extension u* of uw to / is not unique, 
because we can define j(w@) to be any positive number and get an exten- 
sion. Thus yt is not o-finite on “, by Theorem 3. But we can verify this 
directly when y({2) = oo, whether s = o0 or s < oo. Thus in the latter case, 
[L(@) = oo is also the unique extension of uw from “ to .”. This means that 
the condition of o-finiteness on % is only a sufficient and not a necessary 
condition for the unique extension. 

As a ramification of the example above, let 2 = N Ua, Ua@, with two 
extra points adjoined to N, but keep % as before. Then 4 (=#") is strictly 
smaller than “ because neither w; nor w2 belongs to it. From Definition 3 we 
obtain 


"(a Uw) = w*(@1) = L* (a2). 


Thus y* is not even two-by-two additive on .“ unless the three quantities 
above are zero. The two points w; and w) form an inseparable couple. We 
leave it to the curious reader to wonder about other possibilities. 


3 Measures in R 


Let R = (~—oo, +00) be the set of real members, alias the real line, with its 
Euclidean topology. For -co <a<b<+0, 


(12) (a, b] = {x € Ria<x <b} 
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is an interval of a particular shape, namely open at left end and closed at right 
end. For b = +00, (a, +o] = (a, +00) because +00 is not in R. By choice 
of the particular shape, the complement of such an interval is the union of 
two intervals of the same shape: 


(a, b}© = (—oo, a] U (B, ov]. 


When a=), of course (a,a] = @ is the empty set. A finite or countably 
infinite number of such intervals may merge end to end into a single one as 
illustrated below: 


(13) (0, 2] = 0, 1] UC, 21; on=U( |. 


pa n+lon 


Apart from this possibility, the representation of (a, b] is unique. 

The minimal Borel field containing all (a, b] will be denoted by @ and 
called the Borel field of R. Since a bounded open interval is the countable 
union of intervals like (a, b], and any open set in R is the countable union 
of (disjount) bounded open intervals, the Borel field & contains all open 
sets; hence by complementation it contains all closed sets, in particular all 
compact sets. Starting from one of these collections, forming countable union 
and countable intersection successively, a countable number of times, one can 
build up # through a transfinite induction. 

Now suppose a measure m has been defined on &#, subject to the sole 
assumption that its value for a finite (alias bounded) interval be finite, namely 
if —oo <a <b < +0, then 


(14) 0 < m((a, b]) < &. 
We associate a point function F on R with the set function m on 4, as follows: 
(15) F(O)=0; F@) =m((0,x]) forx > 0; FH) = —m((x, 0]) for x < 0. 


This function may be called the “generalized distribution” for m. We see that 
F is finite everywhere owing to (14), and 


(16) m((a, b]) = F(b) — F(a). 
F is increasing (viz. nondecreasing) in R and so the limits 
F(+oo)= lim F(x) <+00, F(-—oo)= lim F(x) > —-—oo 
X—> +00 x——0O 


both exist. We shall write oo for +00 sometimes. Next, F has unilateral limits 
everywhere, and is right-continuous: 


F(x—) < F(x) = F(x+). 
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The right-continuity follows from the monotone limit properties (e) and (f) 
of m and the primary assumption (14). The measure of a single point x is 
given by 

m(x) = F(x) — F(x-). 


We shall denote a point and the set consisting of it (singleton) by the same 
symbol. 

The simplest example of F is given by F(x) = x. In this case F is continu- 
ous everywhere and (14) becomes 


m((a, b]) = b—a. 


We can replace (a, b] above by (a,b), [a, b) or [a, b] because m(x) = 0 for 
each x. This measure is the Jength of the line-segment from a to b. It was 
in this classic case that the following extension was first conceived by Emile 
Borel (1871-1956). 

We shall follow the methods in §§1-2, due to H. Lebesgue and 
C. Carathéodory. Given F as specified above, we are going to construct a 
measure m on & and a larger Borel field @* that fulfills the prescription (16). 

The first step is to determine the minimal field Ap containing all (a, db]. 
Since a field is closed under finite union, it must contain all sets of the form 


(17) B=|Jlj, 1) =; bj), 1S isn; nen. 
1 


Ree 
II 


Without loss of generality, we may suppose the intervals J; to be disjoint, by 
merging intersections as illustiated by 


(1, 3] U @, 4] = d, 4]. 


Then it is clear that the complement B° is of the same form. The union of 
two sets like B is also of the same form. Thus the collection of all sets like 
B already forms a field and so it must be Zp. Of course it contains (includes) 
the empty set @ = (a, a] and R. However, it does not contain any (a, b) except 
R, [a, b), [a, b], or any single point! 

Next we define a measure m on &p satisfying (16). Since the condition 
(d) in Definition 2 requires it to be finitely additive, there is only one way: 
for the generic B in (17) with disjoint 7; we must put 


n 


(18) m(B) = )_ mj) = S_\(F Oj) — FG@)). 
j=! 


j=l 


Having so defined m on Zp, we must now prove that it satisfies the condi- 
tion (d) in toto, in order to proclaim it to be a measure on Zp. Namely, if 
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{By, 1 <k <1] < oo} is a finite or countable sequence of disjoint sets in %p, 
we must prove 


fl 1 
(19) m (U m| = S> m(Bx). 
k=] k=] 


whenever / is finite, and moreover when / = oo and the union U7. B; happens 
to be in Apo. 

The case for a finite / is really clear. If each B, is represented as in 
(17), then the disjoint union of a finite number of them is represented in a 
similar manner by pooling together all the disjoint J ;’s from the B,’s. Then the 
equation (19) just means that a finite double array of numbers can be summed 
in two orders. 

If that is so easy, what is the difficulty when / = co? Jt turns out, as 
Borel saw clearly, that the crux of the matter lies in the following fabulous 
“banality.” 


Borel’s lemma. If —co <a < b < +00 and 
lo. @) 
(20) (a, b] = J@, bj). 
j=l 
where a; < b; for each j, and the intervals (a;, b;| are disjoint, then we have 


(21) F(b) — F(a) = S- (F(b;) — F(a;)) . 
j=l 


PROOF. We will first give a transfinite argument that requires knowledge 
of ordinal numbers. But it is so intuitively clear that it can be appreciated 
without that prerequisite. Looking at (20) we see there is a unique index 
j such that b; = b; name that index k and rename a, as c). By removing 
(a;, by] = (c1, b] from both sides of (20) we obtain 


(22) (a, e)] = J@, by). 
i 


This small but giant step shortens the original (a, b] to (a, c;]. Obviously we 
can repeat the process and shorten it to (a, c2] where a < cp < c; = b, and so 
by mathematical induction we obtain a sequence a < cy, <---<c2<c) =D. 

Needless to say, if for some n we have c, =a, then we have accom- 
plished our purpose, but this cannot happen under our specific assumptions 
because we have not used up all the infinite number of intervals in the union. 
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Therefore the process must go on ad infinitum. Suppose then c, > c,4, for 
all n EN, so that c,, = lim, | c, exists, then c, > a. If c, =a (which can 
easily happen, see (13)), then we are done and (21) follows, although the terms 
in the series have been gathered step by step in a (possibly) different order. 
What if c,, > a? In this case there is a unique j such that b; = c,,; rename 
the corresponding a; as c,;. We have now 


(23) (4, Co) = | J@, 54), 
j=l 


where the (aj, bi]’s are the leftovers from the original collection in (20) after 
an infinite number of them have been removed in the process. The interval 
(C1, Cw] 1S contained in the reduced new collection and we can begin a new 
process by first removing it from both sides of (23), then the next, to be 
denoted by [cy2, Cui], and so on. If for some n we have cy, = a, then (21) 
is proved because at each step a term in the sum is gathered. Otherwise there 
exists the limit lim L Con = Cow > a. Tf Cow = a, then (21) follows in the limit. 


Otherwise c,.,., must be equal to some b; (why?), and the induction goes on. 
Let us spare ourselves of the cumbersome notation for the successive well- 
ordered ordinal numbers. But will this process stop after a countable number 
of steps, namely, does there exist an ordinal number @ of countable cardinality 
such that cy = a? The answer is “yes” because there are only countably many 
intervals in the union (20). 

The preceding proof (which may be made logically formal) reveals the 
possibly complex structure hidden in the “‘order-blind” union in (20). Borel in 
his Thése (1894) adopted a similar argument to prove a more general result 
that became known as his Covering Theorem (see below). A proof of the latter 
can be found in any text on real analysis, without the use of ordinal numbers. 
We will use the covering theorem to give another proof of Borel’s lemma, for 
the sake of comparison (and learning). 

This second proof establishes the equation (21) by two inequalities in 
opposite direction. The first inequality is easy by considering the first n terms 
in the disjoint union (20): 


F(b) ~ F(a) =) (F(bj) ~ F@)). 


jel 


As n goes to infinity we obtain (21) with the “=” replaced by “>”. 
The other half is more subtle: the reader should pause and think why? 
The previous argument with ordinal numbers tells the story. 


392 | SUPPLEMENT: MEASURE AND INTEGRAL 


Borel’s covering theorem. Let [a,b] be a compact interval, and (a jj), 
J € N, be bounded open intervals, which may intersect arbitrarily, such that 


(24) [a,b] c | J(q@, by). 
j=l 


Then there exists a finite integer / such that when / is substituted for oo in 
the above, the inclusion remains valid. 

In other words, a finite subset of the original infinite set of open inter- 
vals suffices to do the covering. This theorem is also called the Heine—Borel 
Theorem; see Hardy [1] (in the general Bibliography) for two proofs by Besi- 
covitch. 


To apply (24) to (20), we must alter the shape of the intervals (a j, 0;] to 
fit the picture in (24). 

Let —oo < a < b < &w; and € > 0. Choose a’ in (a, b), and for each J 
choose b’; > b; such that 


€ 


(25) F(a) — Fla) <5; FQ) — Fb) < 55. 


These choices are possible because F is right continuous; and now we have 
oO 
f / 
[a’, b}. c LJ@;, 4) 
j=l 


as required in (24). Hence by Borel’s theorem, there exists a finite / such that 


I 
(26) [a,b] c | (aj, B}). 


j=l 


From this it follows “easily” that 


I 
(27) F(b) — F@’) < §\(F()) — F(aj)). 


j=l 


We will spell out the proof by induction on J. When / = 1 it is obvious. 
Suppose the assertion has been proved for / — 1, 1 > 2. From (26) as written, 
there is k, 1 < k <1, such that a, < a’ < bj, and so 


(28) F(b,) — F(a’) < F(b,) — F(a). 
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If we intersect both sides of (26) with the complement of (a,, b’.), we obtain 


I 

[b,, b] C UG, 2). 
j=l 
jek 


Here the number of intervals on the right side is / — 1; hence by the induction 
hypothesis we have 


I 
F(b) — F(b,) < SF) — FG@))). 
o 
Adding this to (28) we obtain (27), and the induction is complete. It follows 
from (27) and (25) that 


I 
F(b) — F(a) < 5 (F(bj) — F(@j)) +€. 


j=l 


Beware that the / above depends on e. However, if we change / to oo (back to 
infinity!) then the infinite series of course does not depend on e. Therefore we 


can let € — 0 to obtain (21) when the “=” there is changed to “ <”, namely 
the other half of Borel’s lemma, for finite a and b. 
It remains to treat the case a = —oo and/or b = +o0. Let 


(—00, b} c | J(qj, By]. 


j=l 
Then for any a in (—o, b), (21) holds with “=” replaced by “<”. Letting 
a -> —co we obtain the desired result. The case b = +00 is similar. Q.E.D. 


In the following, all J with subscripts denote intervals of the shape (a, b]; 
>~ denotes union of disjoint sets. Let B € Zo; Bj; € Bo, j € N. Thus 


i=] k=1 
Suppose 
CO 
B=. B; 
j=l 
so that 
n lee) nj 
(29) I, = S> 1 jx. 


394 | SUPPLEMENT: MEASURE AND INTEGRAL 


We will prove 


(30) So mdi) = S> Som jx). 


For n = 1, (29) is of the form (20) since a countable set of sets can be 
ordered as a sequence. Hence (30) follows by Borel’s lemma. In general, 
simple geometry shows that each J; in (29) is the union of a subcollection of 
the J ;,’s. This is easier to see if we order the /;’s in algebraic order and, after 
merging where possible, separate them at nonzero distances. Therefore (30) 
follows by adding n equations, each of which results from Borel’s lemma. 

This completes the proof of the countable additivity of m on Zo, namely 
(19) is true as stipulated there for 1 = 00 as well as] < oo. 

The general method developed in §1 can now be applied to (R, Zo, m). 
Substituting Zo for “~, m for yw in Definition 3, we obtain the outer measure 
m*. It is remarkable that the countable additivity of m on Zo, for which two 
painstaking proofs were given above, is used exactly in one place, at the begin- 
ning of Theorem 1, to prove that m* = m on Zp. Next, we define the Borel 
field %* as in Definition 4. By Theorem 6, (R, Z*, m*) is a complete measure 
space. By Definition 5, m is o-finite on Bo because (—n, n] t (—oo, 00) as 
n t co and m((—n, n]}) is finite by our primary assumption (14). Hence by 
Theorem 3, the restriction of m* to @ is the unique extension of m from Ap 
to ZB. 

In the most important case where F(x) = x, the measure m on Ap is the 
length: m((a, b]) = b — a. It was Borel who, around the turn of the twentieth 
century, first conceived of the notion of a countably additive “length” on an 
extensive class of sets, now named after him: the Borel field Z. A member of 
this class is called a Borel set. The larger Borel field 4* was first constructed 
by Lebesgue from an outer and an inner measure (see pp. 28-29 of main 
text). The latter was later bypassed by Carathéodory, whose method is adopted 
here. A member of .4* is usually called Lebesgue-measurable. The intimate 
relationship between 4 and #* is best seen from Theorem 7. 

The generalization to a generalized distribution function F is sometimes 
referred to as Borel—Lebesgue—Stieltjes. See §2.2 of the main text for the 
special case of a probability distribution. 

The generalization to a Euclidean space of higher dimension presents no 
new difficulty and is encumbered with tedious geometrical “baggage”. 

It can be proved that the cardinal number of all Borel sets is that of the 
real numbers (viz. all points in R), commonly denoted by c (the continuum). 
On the other hand, if Z is a Borel set of cardinal c with m(Z) = 0, such 
as the Cantor ternary set (p. 13 of main text), then by the remark preceding 
Theorem 6, all subsets of Z are Lebesgue-measurable and hence their totality 
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has cardinal 2© which is strictly greater than c (see e.g. [3]). It follows that 
there are incomparably more Lebesgue-measurable sets than Borel sets. 

It is however not easy to exhibit a set in Z* but not in #; see Exercise 
No. 15 on p. 15 of the main text for a clue, but that example uses a non- 
Lebesgue-measurable set to begin with. 

Are there non-Lebesgue-measurable sets? Using the Axiom of Choice, 
we can “define” such a set rather easily; see example [3] or [5]. However, Paul 
Cohen has proved that the axiom is independent of the other logical axioms 
known as Zermelo—Fraenkel system commonly adopted in mathematics; and 
Robert Solovay has proved that in a certain model without the axiom of 
choice, all sets of real numbers are Lebesgue-measurable. In the notation of 
Definition 1 in §1 in this case, 4* = / and the outer measure m* is a measure 
on /. 

N.B. Although no explicit invocation is made of the axiom of choice in 
the main text of this book, a weaker version of it under the prefix “countable” 
must have been casually employed on the q.t. Without the latter, allegedly it is 
impossible to show that the union of a countable collection of countable sets 
is countable. This kind of logical finesse is beyond the scope of this book. 


4 Integral 


The measure space (Q,¥, ) is fixed. A function f with domain Q and 
range in R* = [—o, +00] is called ¥-measurable iff for each real number c 
we have 

{f <c}={weQ: fl) <cleF. 


We write f € ¥ in this case. It follows that for each set A € 4, namely a 


Borel set, we have 
{f EAS EF; 


and both {f = +oo} and { f = —oo} also belong to ¥. Properties of measur- 
able functions are given in Chapter 3, although the measure there is a proba- 
bility measure. 

A function f € ¥ with range a countable set in [0, oo] will be called 
a basic function. Let {a;} be its range (which may include “oo”), and Aj = 
{f =a,}. Then the A,’s are disjoint sets with union &2 and 


(31) f= Dlajla, 
J 


where the sum is over a countable set of /. 
We proceed to define an integral for functions in #, in three stages, 
beginning with basic functions. 
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DEFINITION 8(a). For the basic function f in (31), its integral is defined 
to be 


(32) E(f) => ajulA;) 
J 
and is also denoted by 


[ sau = [ Forme. 

If a term in (32) is 0.00 or 00.0, it is taken to be 0. In particular if f = 0, then 
E(0) = 0 even if u(22) = oo. If A € FY and w(A) = 0, then the basic function 
00.14 + O.14¢ 

has integral equal to 


00.0+ 0.u(AS) = 0. 


We list some of the properties of the integral. 
(i) Let {B;} be a countable set of disjoint sets in F, with union Q and {b ra 
arbitrary positive numbers or 00, not necessarily distinct. Then the function 


(33) S- bj lz, 
j 
is basic, and its integral is equal to 
S- bju(B; ). 
J 


PROOF. Collect all equal b;’s into a; and the corresponding B j Sinto Aj 
as in (31). The result follows from the theorem on double series of positive 
terms that it may be summed in any order to yield a unique sum, possibly -Loo. 


(ii) If f and g are basic and f < g, then 


E(f) < E(g). 
In particular if E( f) = +00, then E(g) = +00. 


prooF. Let f be as in (31) and g as in (33). The doubly indexed set 
{A; © B,} are disjoint and their union is Q. We have using (i): 


E(f)= >> do ajutAjO By); 
k 


J 


E(g) = >> beue(Aj 9 By). 


kj 
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The order of summation in the second double series may be reversed, and the 
result follows by the countable additivity of jw. 


(iii) If f and g are basic functions, a and b positive numbers, then af + 
bg is basic and 
E(af + bg) = aE(f) + bE(g). 


PROOF. It is trivial that af is basic and 
E(af) = aE(f). 


Hence it is sufficient to prove the result for a= b= 1. Using the double 
decomposition in (ii), we have 


E(f +g)= >_ SY -(aj + de) miAj N By). 
GR 
Splitting the double series in two and then summing in two orders, we obtain 
the result. 


It is good time to state a general result that contains the double series 
theorem used above and some other version of it that will be used below. 


Double Limit Lemma. Let {C jx; 7 ¢ N,k € N} be a doubly indexed array 
of real numbers with the following properties: 


(a) for each fixed j, the sequence {C j,;k € N} is increasing in k; 
(b) for each fixed k, the sequence {Cj,; 7 € N} is increasing in /. 


Then we have 


lim t lim f Cj = lim ¢ lim f C jx = +00. 
j j 


The proof is surprisingly simple. Both repeated limits exist by funda- 
mental analysis. Suppose first that one of these is the finite number C. Then 
for any € > 0, there exist jo and ko such that C,,,, > C —¢. This implies 
that the other limit > C —e«. Since ¢€ is arbitrary and the two indices are 
interchangeable, the two limits must be equal. Next if the C above is +00, 
then changing C — € into e~! finishes the same argument. 

As an easy exercise, the reader should derive the cited theorem on double 
series from the Lemma. 

Let A € ¥ and f be a basic function. Then the product 1,4 f is a basic 
function and its integral will be denoted by 


(34) Ea : OC i faue. 
A 
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(iv) Let A, € *, An C Any, for all n and A = U,A,. Then we have 
(35) lim E(An; f) = E(A; f). 


proor. Denote f by (33), so that 14 f = 35, blag, By @, 


E(A; f) = 5° bju(AB;) 
j 


with a similar equation where A is replaced by A,. Since “(A,B;) t u(AB;) 
as n too, and S7"_, t S052, as m ft 00, (35) follows by the double limit 
theorem. 


Consider now an increasing sequence {f,} of basic functions, namely, 
fn < fn4i forall n. Then f =lim, + fn exists and f € FY, but of course f 
need not be basic; and its integral has yet to be defined. By property (ii), the 
numerical sequence E(f,,) is increasing and so lim, + E(f,) exists, possibly 
equal to +o. It is tempting to define E(f) to be that limit, but we need the 
following result to legitimize the idea. 


Theorem 8. Let {f,} and {g,} be two increasing sequences of basic func- 
tions such that 


(36) lim + fn =lim t gp 
(everywhere in (2). Then we have 
(37) lim t E(fn) = lim + E(8n). 
PROOF. Denote the common limit function in (36) by f and put 
A = {w € 2: f(w) > 0}, 


then A € F. Since 0 < 2n < f, we have lacg, = 0 identically; hence by prop- 
erty (ili): 


(38) E(8n) = E(A; 2n) + E(ASs gn) = E(A; gn). 


Fix an n and put for each k € N: 


—1 
Ay = {o EQ: fx (@) > "—gn(w)} : 


Since fx < fxs1, we have Ay C Ax; for all k. We are going to prove that 


(39) J An =A. 
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If w € Ax, then f(w) > fx(m) > [mm — 1)/n]gn(w) = 0; hence w € A. On the 
other hand, if w € A then 


lim t fx(@) = f(@) = gn(@) 


and f(w) > 0; hence there exists an index k such that 


—] 
f x(w) > ——g, (w) 


and so w € Ax. Thus (39) is proved. By property (ii), since 


n-1 
fi > la Fe = la, (“—s,] 


n 


we have 


—1 
E(fe) = Eni fe) > ~——E(Aki Bn): 


Letting k + co, we obtain by property (iv): 


n-1. 
lim E(Ax3 2n) 
n k 


lim + E(fx) > 
n—1] n—-1 
= —— E(A; gn) — ——— E(gn) 

n n 
where the last equation is due to (38). Now let n t 06 to obtain 

lim t E(fx) > lim t E(@n). 

c n 
Since {f,} and {g,} are interchangeable, (37) 1s proved. 


Corollary. Let f, and f be basic functions such that f, ¢ f, then E(f,) t 
E(f). 


PROOF. Take g, = f for all nm in the theorem. 

The class of positive #-measurable functions will be denoted by “4. 
Such a function can be approximated in various ways by basic functions. It is 
nice to do so by an increasing sequence, and of course we should approximate 
all functions in #, in the same way. We choose a particular mode as follows. 

Define a function on [0, oo] by the (uncommon) symbol ) }: 


)O] = 0; oo] = 00; 


\ej=n-lforxe(n—I,nj,n En. 
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Thus )z] = 3, 4] =3. Next we define for any f € 4 the approximating 


sequence { f""}, m € N, by 
(40) f™ (ow) — Jen Fy) 
2m 


Each f‘”) is a basic function with range in the set of dyadic (binary) numbers: 
{k/2”} where k is a nonnegative integer or oo. We have f < f+) for 
all m, by the magic property of bisection. Finally f“ + f owing to the 
left-continuity of the function x + )x]. 


DEFINITION 8(b). For f € A, its integral is defined to be 
(41) E(f) =lim t E(f™). 


When /f is basic, Definition 8(b) is consistent with 8(a), by Corollary to 
Theorem 8. The extension of property (ii) of integrals to A is trivial, because 
f <gimplies f™ < g™). On the contrary, (f +g) is not f™ + g™, but 
since f”) + g™ + (f +), it follows from Theorem 8 that 


lim t E(f + 9) = lim t E((f +.8)) 
that yields property (iii) for A, together with E(af™) + aE(f), for a > 0. 


Property (iv) for A, will be given in an equivalent form as follows. 
(iv) For f € A, the function of sets defined on ¥ by 


A— E(A; f) 


is a measure. 


PROOF. We need only prove that if A= U?°,A, where the A,’s are 
disjoint sets in .7, then 


E(A; f) = 5> E(An; f). 


n=] 


For a basic f, this follows from properties (ili) and (iv). The extension to 7% 
can be done by the double limit theorem and is left as an exercise. 


There are three fundamental theorems relating the convergence of func- 
tions with the convergence of their integrals. We begin with Beppo Levi's 
theorem on Monotone Convergence (1906), which is the extension of Corol- 
lary to Theorem 8 to 4. 
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Theorem 9. Let {f,,} be an increasing sequence of functions in 4% with 
limit f: f, + f. Then we have 
lim + E(fn) = E(f) < +00. 


PROOF. We have f € A; hence by Definition 8(b), (41) holds. For each 
fn, we have, using analogous notation: 


(42) lim ¢ E(fx") = E(fn)- 


Since f, t+ f, the numbers )2”f,,(w)] t)2”f(@)] as n t 00, owing to the 
left continuity of x —)x]. Hence by Corollary to Theorem 8, 


(43) lim + E(fy”) = E(f™). 
It follows that 
lim ¢ lim ¢ E(fy) = lim t E(f™) = E(f). 
On the other hand, it follows from (42) that 
lim ¢ lim ¢ E(f,”) = lim t E(fn). 


Therefore the theorem is proved by the double limit lemma. 


From Theorem 9 we derive Lebesgue’s theorem in its pristine positive 
guise. 


Theorem 10. Let f, € A, n €N. Suppose 


(a) lim, Tn = 0; 
(b) E(sup, fn) < . 


Then we have 


(44) limE(f,) =0 
prooF. Put for n € N: 
(45) &n = aUD as 


Then g, € A, and asn f 00, gn, { limsup, fn, = 0 by (a); and g; = sup, fan 
so that E(g,) < co by (b). 


Now consider the sequence {g; — g,}, € N. This is increasing with limit 
g,. Hence by Theorem 9, we have 


lim t E(g1 — 8n) = E(g1). 
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By property (iii) for F4, 
E(g1 — 8n) + E(8n) = E(81). 


Substituting into the preceding relation and cancelling the finite E(g,), we 
obtain F(g,) | 0. Since 0 < fy, < gn, so thatO < E(f,) < E(g,) by property 
Gi) for F,, (44) follows. 

The next result is known as Fatou’s lemma, of the same vintage 1906 
as Beppo Levi’s. It has the virtue of “no assumptions” with the consequent 
one-sided conclusion, which is however often useful. 


Theorem 11. Let {/,,} be an arbitrary sequence of functions in f,. Then 


we have 
(46) E(lim inf f,) < liminf E(f,). 
PROOF. Put for n € N: 
= inf fx, 
k>n 


then 
liminf f, = lim ¢ gn. 
Hence by Theorem 9, 
(47) E(lim inf f,,) = lim t E(g,). 
n n 


Since 8n < fn, we have E(g,) < E(fn) and 
lim inf E(g,) < liminf E(f,). 
n n 
The left member above is in truth the right member of (47); therefore (46) 
follows as a milder but neater conclusion. 


We have derived Theorem 11 from Theorem 9. Conversely, it is easy to 
go the other way. For if f, + f, then (46) yields E(f) < lim, t E(f,,). Since 
f > fn, E(f) = lim, t E(fn); hence there is equality. 

We can also derive Theorem 10 directly from Theorem 11. Using the 
notation in (45), we have 0 < 2g] — fn < 91. Hence by condition (a) and (46), 


E(g;) = E(liminf(g; — f,)) < liminf(E(g1) — E(fn)) 


= E(g)) — limsup E(fn) 


that yields (44). 
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The three theorems 9, 10, and 11 are intimately woven. 
We proceed to the final stage of the integral. For any f € ¥ with range 
in [—o0, +00], put 


fe fen 7 on {f = O}, ‘ik a at A on {f < 0}, 
~ {[O on {f <0}; ~ ) 0 on {f > 0}. 
Then ft e€A, f- €A, and 
f=f—-f i lfl=frtsr 

By Definition 8(b) and property (iii): 
(48) EU FUSES IPE: 

DEFINITION 8(c). For f € &, its integral is defined to be 
(49) E(f) =E(f*)-E(f’), 


provided the right side above is defined, namely not oo — oo. We say f is 
integrable, or f € L', iff both E(f*) and E(f~) are finite; in this case E(f) 
is a finite number. When E(f) exists but f is not integrable, then it must be 
equal to +00 or —o00, by (49). 

A set A in Q is called a null set iff A € FY and uw(A) = 0. A mathematical 
proposition is said to hold almost everywhere, or a.e. iff there is a null set A 
such that it holds outside A, namely in A‘°. 

A number of important observations are collected below. 


Theorem 12. (i) The function f in ¥ is integrable if and only if |f| is 
integrable; we have 


(50) IE(f)| < Ed Ff). 


(ii) For any f € ¥ and any null set A, we have 
GI) BAN=f fdu=0, BN=BAN= | fan 


(iii) If f € L', then the set {w € Q:|f(@)| = oo} is a null set. 

(iv) If f € L', g €F, and |g| <|f| ae. then g €L!. 

(v) lf f €F¥,2g¢€F, and g= f ae., then E(g) exists if and only if E(f) 
exists, and then E(g) = E(f). 

(vi) If 4(Q) < ov, then any a.e. bounded ¥-measurable function is inte- 
grable. 


PROOF. (i) is trivial from (48) and (49); (ii) follows from 


lal f| < 14.00 
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so that 
0< E(lalf|) < EU4.00) = w(A).00 = 0. 


This implies (51). 
To prove (iii), let 


A(n) = {|f]| 2 7}. 
Then A(n) € # and 
nu(A(n)) = E(A@);n) < EAM); \fl) < ECS). 


Hence 
1 
(52) u(A(n)) < EUs: 


Letting n ¢ 00, so that A(n) J {| f| = 00}; since w(A(1)) < E(\f]) < ~&, we 
have by property (f) of the measure uu: 


u({| Ff] = c0}) = lim } w(A(n)) = 0. 
To prove (iv), let |g| < | f| on AS, where (A) = 0. Then 
Ig] < 1a.co + Ige.|f | 
and consequently 
E(|g|) < u(A).co + E(A®S | f |) < 0.00 + EU f|) = Ef). 


Hence g €L! if f € L}. 

The proof of (v) is similar to that of (iv) and is left as an exercise. The 
assertion (vi) is a special case of (iv) since a constant is integrable when 
pL(Q) < oo. 


Remark. <A case of (52) is known as Chebyshev’s inequality; see p. 48 
of the main text. Indeed, it can be strengthened as follows: 


(53) limnpu(A(n)) < lim E(A(n); |f |) = 9. 


This follows from property (f) of the measure 
A > E(A;|f)); 


see property (iv) of the integral for 7. 
There is also a strengthening of (ii), as follows. 
If B, € F and w(By) > 0 ask > o@, then 


lim E(Bx; f)=09. 
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To prove this we may suppose, without loss of generality, that f € 4. We 
have then 


E(B: f) = E(B NA(n); f) + EB NAM f) 
< E(A(n); f) + E(B). 


Hence 
lim sup E(By; f) < E(A(); f) 
k 


and the result follows by letting n — oo and using (53). 

It is convenient to define, for any f in ¥, a class of functions denoted 
by C(f), as follows: g € C(f) iff g = f ae. When (Q, 4, yw) is a complete 
measure space, such a g is automatically in ¥. To see this, let B = {g # f}. 
Our definition of “a.e.” means only that B is a subset of a null set A; in plain 
English this does not say whether g is equal to f or not anywhere in A — B. 
However if the measure is complete, then any subset of a null set is also a 
null set, so that not only the set B but all its subsets are null, hence in ¥. 
Hence for any real number c, 


{e<xclh={g=fig<clUlg#fie<c} 


belongs to ¥, and so ge F. 

A member of C(f) may be called a version of f, and may be substituted 
for f wherever a null set “does not count”. This is the point of (iv) and (v) in 
Theorem 12. Note that when the measure space is complete, the assumption 
“g €Y” there may be omitted. A particularly simple version of f is the 
following finite version: 


f= f on {| f| < oo}, 
~ ([O on {if| = oo}; 


where 0 may be replaced by some other number, e.g., by 1 in E(log f). 

In functional analysis, it is the class C(f) rather than an individual f 
that is a member of L!. 

As examples of the preceding remarks, let us prove properties (ii) and 
(iii) for integrable functions. 

(ii) if f €L', g €L', and f < gae., then 


E(f) < E(g). 
PROOF. We have, except on a null set: 
ge ae i ol 


but we cannot transpose terms that may be +00! Now substitute finite versions 
of f and g in the above (without changing their notation) and then transpose 
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as follows: 
fitg <ettf. 


Applying properties (ii) and (iii) for A, we obtain 
E(f*)+E(g) < E(g*) + E(f-). 


By the assumptions of L!, all the four quantities above are finite numbers. 
Transposing back we obtain the desired conclusion. 


(iii) if f €L!, ge L!, then f +2 €L!, and 
E(f+g)=E(f) + E(g). 


Let us leave this as an exercise. If we assume only that both E(f) and 
E(g) exist and that the right member in the equation above is defined, namely 
not (+00) + (—oc) or (—oo) + (+00), does E(f + g) then exist and equal 
to the sum? We leave this as a good exercise for the curious, and return to 
Theorem 10 in its practical form. 

Theorem 10. Let f, € F; suppose 

(a) lim, fn = f ae.; 

(b) there exists g € L! such that for all n: 


fal < 4 a.c. 


Then we have 


(c) lim, EU fn — f\) = 09. 


PROOF. observe first that 
| lim fn | < sup | fal; 
H 


Ifn -f\S Ifal +1f | < 2sup|fanl; 


provided the left members are defined. Since the union of a countable collec- 
tion of null sets is a null set, under the hypotheses (a) and (b) there is a null set 
A such that on 2 — A, we have sup, |fn| < g hence by Theorem 12 (iv), all 
lfnis{F 1, |fn— f| are integrable, and therefore we can substitute their finite 
versions without affecting their integrals, and moreover lim, |f, — f| = 0 on 
Q — A. (Remember that f,, — f need not be defined before the substitutions!). 
By using Theorem 12 (ii) once more if need be, we obtain the conclusion (c) 
from the positive version of Theorem 10. 

This theorem is known as Lebesgue’s dominated convergence theorem, 
vintage 1908. When 4(Q2) < 00, any constant C is integrable and may be used 
for g; hence in this case the result is called bounded convergence theorem. 
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Curiously, the best known part of the theorem is the corollary below with a 


. fixed B. 


Corollary. We have 


tim [ frau= [fan 


uniformly in Be ¥. 


This is trivial from (c), because, in alternative notation: 


E(B; fn) — E(B; f)| < EBs \fn — fl) < Ed fn — FI). 


In the particular case where B = Q, the Corollary contains a number of useful 
results such as the integration term by term of power series or Fourier series. 
A glimpse of this is given below. 


5 Applications 


The general theory of integration applied to a probability space is summarized 
in §§3.1-3.2 of the main text. The specialization to R expounded in §3 above 
will now be described and illustrated. 

A function f defined on R with range in [—o, +00] is called a Borel 
function iff f € A; it is called a Lebesgue-measurable function iff f € A*. 
The domain of definition f may be an arbitrary Borel set or Lebesgue- 
measurable set D. This case is reduced to that for D=R by extending the 
definition of f to be zero outside D. The integral of f € 4* corresponding 
to the measure m* constructed from F is denoted by 


E(f) = / F(x)dFO). 


In case F(x) = x, this is called the Lebesgue integral of f; in this case the 
usual notation is, for A € 4": 


| flx)dx = E(A; f). 


Below are some examples of the application of preceding theorems to classic 
analysis. 


Example 1. Let / be a bounded interval in R; {u,} a sequence of functions on /; and 
for x eT: 


n 


S,(X) = Sou (), neN. 


k=} 
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Suppose the infinite series 5°, u,(x) converges J; then in the usual notation: 


oO 


Jim Sax) = Suc) = 5) 
k=] 


k=] 


exists and is finite. Now suppose each u, is Lebesgue-integrable, then so is each sp, 
by property (iii) of the integral; and 


[se dx = Sf mcs dx. 
! k=l ul 


Question: does the numerical series above converge? and if so is the sum of integrals 
equal to the integral of the sum: 


Y [umar= [Suears [separ 
perv T pat ! 


This is the problem of integration term by term. 

A very special but important case is when the interval / = [a, b] is compact 
and the functions uw; are all continuous in /. If we assume that the series $77? , u(x) 
converges uniformly in I, then it follows from elementary analysis that the sequence 
of partial sums {s,,(x)} is totally bounded, that is, 


sup sup |s,,(x)| = sup sup |s,(x)| < 00. 


n x 


Since m(/) < oo, the bounded convergence theorem applies to yield 


lim [> (x)dx = jim Sn (x) dx. 
n I I n 


The Taylor series of an analytic function always converges uniformly and abso- 
lutely in any compact subinterval of its interval of convergence. Thus the result above 
is fruitful. 

Another example of term-by-term integration goes back to Theorem 8. 


Example 2. Let 4% > 0, 4%, € L', then 
b [ % Oo: fe 

(54) (Soa au= > f mau. 
2 k=l kal “4 


Let fn = >oj-,; mu, then f, EL', fy + f = S522, ue. Hence by monotone conver- 
gence 


E(f) =limE(fn) 


that is (54). 
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When uy, is general the preceding result may be applied to |u,| to obtain 


} (> il) = Yo flay 


If this is finite, then the same is true when |u| is replaced by uy and wu, . It then 
follows by subtraction that (54) is also true. This result of term-by-term integration 
may be regarded as a special case of the Fubini~Tonelli theorem (pp. 63—64), where 
one of the measures is the counting measure on N. 

For another perspective, we will apply the Borel—Lebesgue theory of integral to 
the older Riemann context. 


Example 3. Let (/, 4*,m) be as in the preceding example, but let J = [a, b] be 
compact. Let f be a continuous function on J. Denote by P a partition of J as follows: 


Aa=X% <X) <%.< +++ <x, =); 
and put 
d(P) = max (xq — Xx-1). 
I<k<n 


For each k, choose a point &; in [x,_1, x,], and define a function fp as follows: 


aC ie for x € [xo, x1], 


&, for x € (1, %)],2 <k <n. 
Particular choices of & are: & = f -1)5& = f(x); 


(55) & = min f(x); &= max f(). 

Xk] SXSXE Xk—-1 SX 
The fp is called a step function; it is an approximant of f. It is not basic by Defini- 
tion 8(a) but f~ and f> are. Hence by Definitions 8(a) and 8(c), we have 


E(fp)= S> FE — Xx-1). 


k=] 


The sum above is called a Riemann sum; when the & are chosen as in (55), they are 
called lower and upper sums, respectively. 
Now let {P(n),n € N} be a sequence of partitions such that 6(P(n)) — 0 as 
n —» oo. Since f is continuous on a compact set, it is bounded. It follows that there 
is a constant C such that 
sup sup | f p(n) (X)|_ < C. 


néN xel 


Since I is bounded, we can apply the bounded convergence theorem to conclude that 
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The finite existence of the limit above signifies the Riemann-integrability of f, and the 
limit is then its Riemann-integral f Ff (x) dx. Thus we have proved that a continuous 
function on a compact interval is Riemann-integrable, and its Riemann-integral is equal 
to the Lebesgue integral. Let us recall that in the new theory, any bounded measurable 
function is integrable over any bounded measurable set. For example, the function 


1 
sin-, xe€(0,1] 
x 


being bounded by 1 is integrable. But from the strict Riemannian point of view it 
has only an “improper” integral because (0, 1] is not closed and the function is not 
continuous on [0, 1], indeed it is not definable there. Yet the limit 


] 1 
lim / sin — dx 
«10 Je x 


exists and can be defined to be f sin(1/x) dx, As a matter of fact, the Riemann sums 
do converge despite the unceasing oscillation of f between 0 and J as x J 0. 


“ 


Example 4. The Riemann integral of a function on (0,00) is called an “infinite 


integral” and is definable as follows: 
io.) n 
[ f()dx = lim [ Ff) dx 
0 NOOO 0 
when the limit exists and is finite. A famous example is 
sinx 
(56) f@=—, x€ (0,00). 
x 


This function is bounded by 1 and is continuous. It can be extended to [0, 00) by 
defining f(0) = 1 by continuity. A cute calculation (see §6.2 of main text) yields the 


result (useful in Optics): 
sin x 
lim [= ——dx= 2 


By contrast, the function | f| is not Lebesgue-integrable. To show this, we use 
trigonometry: 


. + 
sinx 1 1 3n 
—— > —- —_—_—— = C, f Ee {2 ann + In. 
(= ) = Va (Qn+)n one (2nn +7, m7 7) r 
Thus for x > 0: 


(=) = de, 1), (0). 


The right member above is a basic function, with its integral: 


a 
D, Com) =) ay = 
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It follows that E(f+) = +00. Similarly E(f~) = +00; therefore by Definition 8(c) 
E(f) does not exist! This example is a splendid illustration of the following 
Non-Theorem. 

Let f€ Band f, = flon,neN. Then f, €B and f, > f as n> ov. 
Even when the f/s are “totally bounded”, it does not follow that 


(57) limE(f,) = E(f); 


indeed E(f) may not exist. 

On the other hand, if we assume, in addition, either (a) f > 0; or (b) E(f) 
exists, in particular f € L!; then the limit relation will hold, by Theorems 9 and 10, 
respectively. The next example falls in both categories. 


Example 5. The square of the function f in (56): 


: 2 
f@yr = (=) < BER 


is integrable in the Lebesgue sense, and is also improperly integrable in the Riemann 
sense. 


We have , 
2 
SQ) < 11,41) + 1 (-00,-DUC+, +00) 55 


and the function on the right side is integrable, hence so is f?. 
Incredibly, we have 


[ (=) pee SRT a DUE 
0 x 2 0 Xx 


where we have inserted an “RJ” to warn against taking the second integral as a 
Lebesgue integral. See §6.2 for the calculation. So far as I know, nobody has explained 
the equality of these two integrals. 


Example 6. The most notorious example of a simple function that is not Riemann- 
integrable and that baffled a generation of mathematicians is the function 1g, where Q 
is the set of rational numbers. Its Riemann sums can be made to equal any real number 
between 0 and 1, when we confine Q to the unit interval (0, 1). The function is so 
totally discontinuous that the Riemannian way of approximating it, horizontally so to 
speak, fails utterly. But of course it is ludicrous even to consider this indicator function 
rather than the set Q itself. There was a historical reason for this folly: integration was 
regarded as the inverse operation to differentiation, so that to integrate was meant to 
“find the primitive” whose derivative is to be the integrand, for example, 
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A primitive is called “indefinite integral”, and fra /x)dx e.g. is called a “definite 
integral.” Thus the unsolvable problem was to find f Io(x)dx,0<€ <1. 

The notion of measure as length, area, and volume is much more ancient than 
Newton’s fluxion (derivative), not to mention the primitive measure of counting with 
fingers (and toes). The notion of “countable additivity” of a measure, although seem- 
ingly natural and facile, somehow did not take hold until Borel saw that 


m(Q) = > )m(q) = 970 = 0. 


qeQ qeQ 


There can be no question that the “length” of a single point g is zero. Euclid gave it 
“zero dimension”. 

This is the beginning of MEASURE. An INTEGRAL is a weighted measure, 
as is obvious from Definition 8(a). The rest is approximation, vertically as in Defini- 
tion 8(b), and convergence, as in all analysis. 

As for the connexion with differentiation, Lebesgue made it, and a clue is given 
in §1.3 of the main text. 
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Stone~ Weierstrass theorem, 92, 198 
Stopped (super) martingale, 340-341 
Stopping time, see Optional sampling 
Subdistribution function, 88 
Submartingale, 335 
convergence theorems for, 350 
convex transformation of, 336 
Doob decomposition, 337 
inequalties, 346 
Subprobability measure, 85 
Subsequences, method of, 109 
Superharmonic function, 361 
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Supermartingale, 335 
optional sampling, 340 

Support of d.f., 10 
Support of p.m., 32 
Symmetric difference (a), 16 
Symmetric random variable, 165 
Symmetric stable law, 193, 250 
Symmetrization, 155 
System theorem, see Gambling system, 

Submartingale 


T 


Tauberian theorem, 292 

Taylor’s series, 177 

Three series theorem, 125 

Tight, 95 

Topological measure space, 28 
Trace, 23 

Transition probability function, 331 
Trial, 57 

Truncation, 115 

Types of d.f., 184 


U 


Uncorrelated, 107 
Uniform distribution, 30 
Uniform integrability, 99 
Uniqueness theorem 
for ch.f., 160, 168 
for Laplace transform, 199, 201 
for measure, 30 
Upcrossing inequality, 348 


v 


Vague convergence, 85, 92 
general criterion, 96 
Variance, 49 


Ww 


Wald’s equation, 144, 345 
Weak convergence, 73 
Weierstrass theorem, 145 
Wiener—Hopf technique, 291 


Zz 


Zero-or-one law 
Hewitt~Savage, 268 
Kolmogorov, 267 
Lévy, 357 


